Managing operational throughput for shared resources

ABSTRACT

Usage of shared resources can be managed by enabling users to obtain different types of guarantees at different times for various types and/or levels of resource capacity. A user can select to have an amount or rate of capacity dedicated to that user. A user can also select reserved capacity for at least a portion of the requests, tasks, or program execution for that user, where the user has priority to that capacity but other users can utilize the excess capacity during other periods. Users can alternatively specify to use the excess capacity or other variable, non-guaranteed capacity. The capacity can be for any appropriate functional aspect of a resource, such as computational capacity, throughput, latency, bandwidth, and storage. Users can submit bids for various types and combinations of excess capacity, and winning bids can receive dedicated use of the excess capacity for at least a period of time.

CROSS-REFERENCE TO RELATED APPLICATIONS

This is a divisional application of parent U.S. patent application Ser.No. 12/882,082, filed on Sep. 14, 2010, entitled “MANAGING OPERATIONALTHROUGHPUT FOR SHARED RESOURCES,” which is hereby incorporated herein byreference in its entirety.

BACKGROUND

As an increasing number of applications and services are being madeavailable over networks such as the Internet, an increasing number ofcontent, application, and/or service providers are turning totechnologies such as remote resource sharing and cloud computing. Cloudcomputing, in general, is an approach to providing access to electronicresources through services, such as Web services, where the hardwareand/or software used to support those services is dynamically scalableto meet the needs of the services at any given time. A user or customertypically will rent, lease, or otherwise pay for access to resourcesthrough the cloud, and thus does not have to purchase and maintain thehardware and/or software to provide access to these resources.

In some environments, multiple users can share resources such as remoteservers and data repositories, wherein the users can concurrently sendmultiple requests to be executed against the same resource. Problems canarise, however, since there is a limited amount of capacity for eachtype of resource. Conventional systems address these problems byproviding dedicated resources to users and/or purchasing additionalcapacity, but such approaches are expensive and often result in unusedexcess capacity. Further, each resource can have more than one type ofcapacity, such as a compute capacity, a throughput limit, an availablebandwidth, and other such aspects. Since conventional systems do notoptimize the usage of various types of resource capacity for sharedresources, there often is excess capacity in one or more of thesecapacity types even if one or more other types of capacity are beingsubstantially fully utilized.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments in accordance with the present disclosure will bedescribed with reference to the drawings, in which:

FIG. 1 illustrates an environment in which various embodiments can beimplemented;

FIG. 2 illustrates an example separation of management and hostcomponents that can be used in accordance with various embodiments;

FIG. 3 illustrates an example allocation for multiple customers that canbe used in accordance with various embodiments;

FIG. 4 illustrates an example allocation across multiple resourceinstances that can be used in accordance with various embodiments;

FIG. 5 illustrates an example process for fulfilling a request inaccordance with one embodiment;

FIGS. 6( a) and 6(b) illustrate approaches that can be used foraccepting bids in accordance with various embodiments;

FIGS. 7( a) and 7(b) illustrate example bid sets that can be provided inaccordance with various embodiments;

FIGS. 8( a) and 8(b) illustrate time windowing approaches for bandwidthguarantees that can be used in accordance with various embodiments;

FIG. 9 illustrates components useful for shifting data between devicesproviding differing levels of latency that can be used in accordancewith various embodiments;

FIG. 10 illustrates an example process for maintaining user latency neara target latency value that can be used in accordance with variousembodiments; and

FIG. 11 illustrates an example environment that can take advantage offunctionality of the various embodiments.

DETAILED DESCRIPTION

Systems and methods in accordance with various embodiments of thepresent disclosure may overcome one or more of the aforementioned andother deficiencies experienced in conventional approaches to managingaspects of resource sharing and allocation in an electronic environment.For example, various embodiments enable users to request a specificquality of service or level of processing for each of a plurality ofdifferent resource aspects, such as a guaranteed and/or committed amountof: throughput, bandwidth, latency, processing capacity, and/or storagecapacity for a given resource. The requested amount(s) can be anyappropriate amount, which can be less or greater than the total amountprovided by any single instance or device of the respective resource,providing improved granularity over that which is possible withconventional approaches. Multiple customers can be assigned to a singleresource, such as a data server or data store, with each of thecustomers potentially receiving at least one guaranteed level of serviceover at least a specified period of time. By managing the allocationsfor customers on various resources at different times according todifferent functional aspects, customers can obtain resource usage thatprovides desired levels of performance for one or more aspects of aresource, during the times when those levels are needed, but minimizesthe cost to the user that would otherwise be associated with dedicatedcapacity and/or hardware for those users.

Certain terms are used herein for purposes of clarity of explanation,but it should be understood that such terms in the various examples arenot intended to be interpreted as limitations on those examples or thevarious embodiments. For example, it should be understood that termssuch as “user” and “customer” are used substantially interchangeablyherein, as a user of a management system or service as discussed hereinmay or may not be a paying customer or subscriber of the service, etc.Further, there can be multiple types of requests at various stages,locations, or other portions of the various embodiments, such asrequests from a user to a control plane to obtain or purchase an amountof reserved or dedicated capacity (or a right to create future instancesor volumes), requests from a user to a control plane to launch aninstance, create a volume, or otherwise invoke that reserved capacity,and requests from applications in the data plane to perform specificoperations against a particular instance, volume, or capacity, amongothers. For purposes of clarity, requests to obtain the rights to createfuture instances or volumes, or perform similar actions, will bereferred to herein as “reservation requests.” Requests to launch aninstance or create a volume per those rights, or perform similaractions, will be referred to as “instance requests.” Requests fromapplications or other sources to be executed or processed against userinstances, volumes, etc., will be referred to herein as “data requests”or “I/O operations.” It should be understood, however, that these termsare used only for convenience of explanation and are not intended toimply that those types of requests are limited in nature to the type ofoperation indicated by the specific name of the request, as an “instancerequest” might not require that a new instance be launched, but canrelate in some other way to providing specific capacity or resources fora user, or for a similar such purpose.

Customers in various embodiments can be provided with different types ofresource capacity guarantees. For example, a customer might wantrequests to be processed with an average or maximum amount of latencyand with a specific amount of throughput. Various embodiments can placeusers on common resources based on various combinations of these andother such factors. For example, a user who needs a lot of storage thatwill rarely be accessed can be placed on a resource with a user whoneeds very little storage but will require a lot of throughput tofrequently access that data. Further, users with high specific capacityrequests can be given priority over other users when resources with thathigher capacity become available. Various other ways of selecting userrequests to process on various resources are discussed with respect tothe various embodiments.

Further, a customer can be enabled to request different qualities ofservice, or different types of guarantees, at different times. Forexample, a customer might request a higher level of throughput atcertain times of the day, but when the resource will not be used at thesame level the customer might want a lower level of service. Variousembodiments enable a customer to schedule different qualities of servicethroughout the day, or request a first quality of service up to acertain amount and a second quality of service for any requests inexcess of that amount. In some embodiments, customers can request adedicated amount of capacity for one or more resource types that willalways be available to that customer. In some embodiments, a customercan also request dedicated reserved capacity that can come at a lowercost, and can enable the customer to use that capacity when needed, aswell as to enable other users to utilize that capacity when not beingused by the customer having the reserved capacity.

Customers then can also utilize the unused or excess capacity fromdedicated, reserved, or other such resource capacity. In manyembodiments, customers can “bid” to use the excess capacity. Forexample, a customer can submit an instance request with a bid price anda specification of at least one resource guarantee to be provided forthe request, such as a minimum throughput, compute capacity, etc. If aresource becomes available that meets the capacity requirement(s) forthe instance request, if the bid exceeds any other requests (orotherwise has preference or priority), and if the bid at least meets acurrent market price for that capacity, the instance request can beprocessed using the excess capacity. In various embodiments, thecustomer with the winning bid will obtain dedicated use of that excesscapacity for at least a period of time to process I/O operationsassociated with the instance created per the instance request. Afterthat minimum time, the bid amount can be reexamined and, if the requestno longer meets the winning criteria discussed above, or some other suchcriteria, fulfilling of the instance request for that user on thatresource can be terminated (e.g., the instance can be terminated on thatresource). Further, if the capacity is excess capacity reserved ordedicated to another user, the customer can be kicked off the resourceat any time if the dedicated or reserved customer resumes using thatresource.

In some cases a customer might exceed the customer's dedicated orreserved capacity. In such cases, the customer might elect to submitbids in an attempt to process the excess requests with excess resourcecapacity. If excess capacity is not available, or if the customer doesnot wish to exceed a certain cost point, the customer can elect tosubmit an instance request as a standard request which can be processedwith variable and/or on-demand capacity that may not come with anyguarantees. Such a request will only be processed if variable capacityis available, and may be limited to the types of capacity available.

The time at which certain guarantees are provided also can vary betweenembodiments. For example, a customer might request a certain quality ofservice, such as a certain amount of bandwidth, at a certain time ofday, with a different guarantee (or no guarantee) at other times of day.In other embodiments, guarantees or dedicated capacity might be providedusing one or more sliding time windows, wherein a customer is guaranteedto get a certain amount of time (e.g., twenty minutes or an hour) everyday with at least one resource guarantee, but the system might determinewhen during the course of the day to provide that functionality. Acustomer might be charged less for a sliding window approach than afixed time approach, as the system can provide the resource when thereis a lower load on the system or there is otherwise more excess capacityto use to provide the guarantee, which can reduce the total resourcecapacity that the system or service must provide.

In some embodiments, a private pool of excess resource capacity of oneor more resource capacity types can be maintained for, and associatedwith, a customer user, such as may be based on currently unused resourcecapacity that has been allocated for dedicated use by that customer,with the private pool of excess resource capacity being available forpriority use by the customer. Such private excess resource capacitypools may further be provided to a general, non-private excess capacitypool that is available for use by various users, which can also includeusers who are associated with the private excess capacity pool(s). Theusage of the resources can, in some embodiments, be managed using aprogram execution service (“PES”) that executes multiple programs orotherwise processes requests or tasks on behalf of multiple customers orsubscribers. The PES can provide a plurality of resource nodes (e.g.,multiple physical computing systems and/or virtual machines that arehosted on one or more physical computing systems) and other suchresources for executing user programs and fulfilling user requests.

In some embodiments, at least some excess or otherwise unused resourcecapacity of a PES or other group of resources may be made available tousers on a temporary or non-guaranteed basis, such that the excessresource capacity can be allocated to other users until a time that thecapacity is desired for other purposes (e.g., for preferential orreserved use). Such excess capacity may, for example, be made availableas part of one or more general excess capacity pools that are availablefor use by various users, such as via a spot market with dynamicallychanging pricing to reflect supply and demand. In some cases, one ormore programs may be executing on behalf of a user using excess resourcecapacity at the time that the excess resource capacity is desired forother purposes, and, in some such cases, the use of that excess capacity(e.g., storage for that user in the excess capacity) may beautomatically terminated (e.g., deleted) by the PES in order to makethat excess capacity available for the other purposes. In at least someembodiments, the user requests or operations can be automaticallyrestarted o at a future time, such as when a sufficient amount of excesscapacity again becomes available for such purposes. Alternatively, otherresource capacity may be identified and used in place of the excessresource capacity that is desired for the other purposes, so as toenable the operations relying on the excess resource capacity tocontinue to be processed or otherwise fulfilled.

In some embodiments, at least some of the available resource capacitycan be allocated to one or more users for preferential use by thoseusers, such that each of those users has priority access relative toother users to use a respective amount of the resource capacity. Forexample, the priority access of the users may be based on each of theusers having dedicated or exclusive access to use the respective amountof resource capacity (e.g., each user having one or more dedicatedresources and/or portions thereof that are allocated for reserved orguaranteed use by the user). In at least some such embodiments, arespective amount of resource capacity may be allocated to a particularuser for dedicated access over a specified period of time, such as in amanner analogous to a lease of one or more physical computing systems sothat the respective amount of resource capacity may be available to theuser throughout the specified period of time. In addition, a user may begiven preferential or other dedicated access to resource capacity basedon one or more factors, such as fees paid by the user, an agreement to acontractual obligation for using the dedicated access for a period oftime and/or subject to other conditions, etc.

If a user has a private pool of excess resource capacity and there is aseparate general pool of excess resource capacity that is alsoavailable, the different excess resource capacity pools may be used invarious manners. For example, if such a user makes an instance requestto use excess resource capacity, the instance request may first besatisfied using that user's private pool if the pool has sufficientcomputing capacity for the request, and otherwise the request may beconsidered for satisfaction by the general excess capacity pool alongwith instance requests from other users. Similarly, if one or more firstprograms for such a user are using the user's private pool of excesscapacity, and that excess capacity is desired by the user for otherpurposes (e.g., to store information for other second programs for theuser as part of the user's dedicated computing capacity), the use by theone or more first programs may in some embodiments automatically bemoved to the general excess capacity pool.

In addition to the types of dedicated, reserved, and excess resourcecapacity capabilities discussed above, a customer in various embodimentsmay also be able to utilize on-demand variable resource capacity that isavailable to satisfy at least some dynamically received requests fromusers, whether the requests are to be instance requests to be processedimmediately upon receipt, reservation requests for an indicated futuretime or at some time during an indicated future time period, etc. Such arequest can be processed if resources sufficient to satisfy the requestare available at (or near) the requested time, but without such arequest being guaranteed to be satisfied (i.e., without sufficientresources being guaranteed to be available). For example, after anon-demand variable resource capacity instance request is received forimmediate execution, the instance request may be processed for the userif an appropriate amount of on-demand resource capacity is currentlyavailable, and otherwise the instance request may be denied (or in somecases, postponed). Thus, in some such embodiments, such a request foron-demand variable capacity may be unsuccessful, such as if theappropriate amount of capacity is not available at the time of therequested execution.

Furthermore, in embodiments in which a PES provides multiple typesand/or levels dedicated resource capacity, excess resource capacity, andon-demand variable resource capacity to users for fees, the feesassociated with the different types of capacity may differ in variousmanners, such as to reflect associated availability guarantees and/orother factors. As one example, the overall cost for a user receive aguaranteed rate of 1,000 TOPS may be higher than the cost to the user toreceive a rate of 1,000 IOPS using on-demand variable capacity (ifavailable), and that cost to the user to use the on-demand variablecapacity may be higher than the cost to the user to make use of acomparable amount of excess capacity (if available) from a general pool.In some cases, however, the cost of using dedicated capacity may includea one-time or periodic fee that is not based on actual use, and aseparate ongoing incremental cost for a user to make use of a particularamount of dedicated capacity for a particular amount of time, with thatongoing incremental cost for a particular amount of dedicated capacityuse optionally being less than the cost for using a comparable amount ofgeneral excess capacity pool for that period of time. Furthermore, asnoted above, costs for using a private excess capacity pool may differfrom those of using a general excess capacity pool, such as to be thesame as the ongoing incremental cost for dedicated capacity use. Variousother possibilities are contemplated within the scope of the variousembodiments described and suggested below.

Systems and methods in accordance with various embodiments are operableto manage access to resources such as data storage and data servers. Inat least some embodiments, these approaches include providing a blockdata storage service that uses multiple server storage systems toreliably store block data that may be accessed and used over one or morenetworks by any of various users, applications, processes, and/orservices. Users of the block data storage service may each create one ormore block data storage volumes that each have a specified amount ofblock data storage space, and may initiate use of such a block datastorage volume (also referred to as a “volume” herein) by one or moreexecuting programs, with at least some such volumes having copies storedby two or more of the multiple server storage systems so as to enhancevolume reliability and availability to the executing programs. As oneexample, the multiple server block data storage systems that store blockdata may in some embodiments be organized into one or more pools orother groups that each have multiple physical server storage systemsco-located at a geographical location, such as in each of one or moregeographically distributed data centers, and the program(s) that use avolume stored on a server block data storage system in a data center mayexecute on one or more other physical computing systems at that datacenter.

In addition, in at least some embodiments, applications that access anduse one or more such non-local block data storage volumes over one ormore networks may each have an associated node manager that manages theaccess to those non-local volumes by the program, such as a node managermodule that is provided by the block data storage service and/or thatoperates in conjunction with one or more Block Data Service (BDS) SystemManager modules. For example, a first user who is a customer of theblock data storage service may create a first block data storage volume,and execute one or more program copies on one or more resource nodesthat are instructed to access and use the first volume (e.g., in aserial manner, in a simultaneous or other overlapping manner, etc.).When an application executing on a resource node initiates use of anon-local volume, the application may mount or otherwise be providedwith a logical block data storage device that is local to the resourcenode and that represents the non-local volume, such as to allow theexecuting program to interact with the local logical block data storagedevice in the same manner as any other local hard drive or otherphysical block data storage device that is attached to the resource node(e.g., to perform read and write data access requests, to implement afile system or database or other higher-level data structure on thevolume, etc.). For example, in at least some embodiments, arepresentative logical local block data storage device may be madeavailable to an executing program via use of an appropriate technology,such as GNBD (“Global Network Block Device”) technology. In addition,when an application interacts with the representative local logicalblock data storage device, the associated node manager may manage thoseinteractions by communicating over one or more networks with at leastone of the server block data storage systems that stores a copy of theassociated non-local volume (e.g., in a manner transparent to theexecuting program and/or resource node) so as to perform theinteractions on that stored volume copy on behalf of the executingprogram. Furthermore, in at least some embodiments, at least some of thedescribed techniques for managing access of applications and services tonon-local block data storage volumes are automatically performed byembodiments of a Node Manager module.

In at least some embodiments, block data storage volumes (or portions ofthose volumes) may further be stored on one or more remote archivalstorage systems that are distinct from the server block data storagesystems used to store volume copies. In various embodiments, the one ormore remote archival storage systems may be provided by the block datastorage service (e.g., at a location remote from a data center or othergeographical location that has a pool of co-located server block datastorage systems), or instead may be provided by a remote long-termstorage service and used by the block data storage, and in at least someembodiments the archival storage system may store data in a format otherthan block data (e.g., may store one or more chunks or portions of avolume as distinct objects).

In some embodiments, at least some of the described techniques areperformed on behalf of a program execution service that managesexecution of multiple programs on behalf of multiple users of theprogram execution service. In some embodiments, the program executionservice may have groups of multiple co-located physical host computingsystems, and may execute users' programs on those physical hostcomputing systems, such as under control of a PES system manager, asdiscussed in greater detail below. In such embodiments, users of theprogram execution service (e.g., customers of the program executionservice who pay fees to use the program execution service) who are alsousers of the block data storage service may execute programs that accessand use non-local block data storage volumes provided via the block datastorage service. In other embodiments, a single organization may provideat least some of both program execution service capabilities and blockdata storage service capabilities (e.g., in an integrated manner, suchas part of a single service), while in yet other embodiments the blockdata storage service may be provided in environments that do not includea program execution service (e.g., internally to a business or otherorganization to support operations of the organization).

In addition, the host computing systems on which programs execute mayhave various forms in various embodiments. Multiple such host computingsystems may, for example, be co-located in a physical location (e.g., adata center), and may be managed by multiple node manager modules thatare each associated with a subset of one or more of the host computingsystems. At least some of the host computing systems may each includesufficient computing resources (e.g., volatile memory, CPU cycles orother CPU usage measure, network bandwidth, swap space, etc.) to executemultiple programs simultaneously, and, in at least some embodiments,some or all of the computing systems may each have one or morephysically attached local block data storage devices (e.g., hard disks,tape drives, etc.) that can be used to store local copies of programs tobe executed and/or data used by such programs. Furthermore, at leastsome of the host computing systems in some such embodiments may eachhost multiple virtual machine resource nodes that each may execute oneor more programs on behalf of a distinct user, with each such hostcomputing system having an executing hypervisor or other virtual machinemonitor that manages the virtual machines for that host computingsystem. For host computing systems that execute multiple virtualmachines, the associated node manager module for the host computingsystem may in some embodiments execute on at least one of multiplehosted virtual machines (e.g., as part of or in conjunction with thevirtual machine monitor for the host computing system), while in othersituations a node manager may execute on a physical computing systemdistinct from one or more other host computing systems being managed.

The server block data storage systems on which volumes are stored mayalso have various forms in various embodiments. In at least someembodiments, some or all of the server block data storage systems may bephysical computing systems similar to the host computing systems thatexecute programs, and in some such embodiments may each execute serverstorage system software to assist in the provision and maintenance ofvolumes on those server storage systems. For example, in at least someembodiments, one or more of such server block data storage computingsystems may execute at least part of the BDS System Manager, such as ifone or more BDS System Manager modules are provided in a distributedpeer-to-peer manner by multiple interacting server block data storagecomputing systems. In other embodiments, at least some of the serverblock data storage systems may be network storage devices that may lacksome I/O components and/or other components of physical computingsystems, such as if at least some of the provision and maintenance ofvolumes on those server storage systems is performed by other remotephysical computing systems (e.g., by a BDS System Manager moduleexecuting on one or more other computing systems). In addition, in someembodiments, at least some server block data storage systems eachmaintains multiple local hard disks, and stripes at least some volumesacross a portion of each of some or all of the local hard disks.Furthermore, various types of techniques for creating and using volumesmay be used, including in some embodiments to use LVM (“Logical VolumeManager”) technology.

In at least some embodiments, some or all block data storage volumeseach have copies stored on two or more distinct server block datastorage systems, such as to enhance reliability and availability of thevolumes. By doing so, failure of a single server block data storagesystem may not cause access of executing programs to a volume to belost, as use of that volume by those executing programs may be switchedto another available server block data storage system that has a copy ofthat volume. In such embodiments, consistency may be maintained betweenthe multiple copies of a volume on the multiple server block datastorage systems in various ways. For example, in some embodiments, oneof the server block data storage systems is designated as storing theprimary copy of the volume, and the other one or more server block datastorage systems are designated as storing mirror copies of the volume insuch embodiments, the server block data storage system that has theprimary volume copy (referred to as the “primary server block datastorage system” for the volume) may receive and handle data accessrequests for the volume, and in some such embodiments may further takeaction to maintain the consistency of the other mirror volume copies(e.g., by sending update messages to the other server block data storagesystems that provide the mirror volume copies when data in the primaryvolume copy is modified, such as in a master-slave computingrelationship manner). Various types of volume consistency techniques maybe used, with additional details included below.

In addition to maintaining reliable and available access of executingprograms to block data storage volumes by moving or otherwisereplicating volume copies when server block data storage systems becomeunavailable, the block data storage service may perform other actions inother situations to maintain access of executing programs to block datastorage volumes. For example, if a first executing program unexpectedlybecomes unavailable, in some embodiments the block data storage serviceand/or program execution service may take actions to have a differentsecond executing program (e.g., a second copy of the same program thatis executing on a different host computing system) attach to some or allblock data storage volumes that were in use by the unavailable firstprogram, so that the second program can quickly take over at least someoperations of the unavailable first program. The second program may insome situations be a new program whose execution is initiated by theunavailability of the existing first program, while in other situationsthe second program may already be executing (e.g., if multiple programcopies are concurrently executed to share an overall load of work, suchas multiple Web server programs that receive different incoming clientrequests as mediated by a load balancer, with one of the multipleprogram copies being selected to be the second program; if the secondprogram is a standby copy of the program that is executing to allow a“hot” swap from the existing first program in the event ofunavailability, such as without the standby program copy being activelyused until the unavailability of the existing first program occurs;etc.). In addition, in some embodiments, a second program to which anexisting volume's attachment and ongoing use is switched may be onanother host physical computing system in the same geographical location(e.g., the same data center) as the first program, while in otherembodiments the second program may be at a different geographicallocation (e.g., a different data center, such as in conjunction with acopy of the volume that was previously or concurrently moved to thatother data center and will be used by that second program). Furthermore,in some embodiments, other related actions may be taken to furtherfacilitate the switch to the second program, such as by redirecting somecommunications intended for the unavailable first program to the secondprogram.

As previously noted, in at least some embodiments, some or all blockdata storage volumes each have copies stored on two or more distinctserver block data storage systems at a single geographical location,such as within the same data center in which executing programs willaccess the volume by locating all of the volume copies and executingprograms at the same data center or other geographical location, variousdesired data access characteristics may be maintained (e.g., based onone or more internal networks at that data center or other geographicallocation), such as latency and throughput. For example, in at least someembodiments, the described techniques may provide access to non-localblock data storage that has access characteristics that are similar toor better than access characteristics of local physical block datastorage devices, but with much greater reliability that is similar to orexceeds reliability characteristics of RAID (“Redundant Array ofIndependent (or Inexpensive) Disks”) systems and/or dedicated SANs(“Storage Area Networks”) and at much lower cost. In other embodiments,the primary and mirror copies for at least some volumes may instead bestored in other manners, such as at different geographical locations(e.g., different data centers), such as to further maintain availabilityof a volume even if an entire data center becomes unavailable. Inembodiments in which volume copies may be stored at differentgeographical locations, a user may in some situations request that aparticular program be executed proximate to a particular volume (e.g.,at the same data center at which the primary volume copy is located), orthat a particular volume be located proximate to a particular executingprogram, such as to provide relatively high network bandwidth and lowlatency for communications between the executing program and primaryvolume copy.

Furthermore, access to some or all of the described techniques may insome embodiments be provided in a fee-based or other paid manner to atleast some users. For example, users may pay one-time fees, periodic(e.g., monthly) fees and/or one or more types of usage-based fees to usethe block data storage service to store and access volumes, to use theprogram execution service to execute programs, and/or to use archivalstorage systems (e.g., provided by a remote long-term storage service)to store long-term backups or other snapshot copies of volumes. Fees maybe based on one or more factors and activities, such as indicated in thefollowing non-exclusive list: based on the size of a volume, such as tocreate the volume (e.g., as a one-time fee), to have ongoing storageand/or use of the volume (e.g., a monthly fee), etc.; based on non-sizecharacteristics of a volume, such as a number of mirror copies,characteristics of server block data storage systems (e.g., data accessrates, storage sizes, etc.) on which the primary and/or mirror volumecopies are stored, and/or a manner in which the volume is created (e.g.,a new volume that is empty, a new volume that is a copy of an existingvolume, a new volume that is a copy of a snapshot volume copy, etc.);based on the size of a snapshot volume copy, such as to create thesnapshot volume copy (e.g., as a one-time fee) and/or have ongoingstorage of the volume (e.g., a monthly fee); based on the non-sizecharacteristics of one or more snapshot volume copies, such as a numberof snapshots of a single volume, whether a snapshot copy is incrementalwith respect to one or more prior snapshot copies, etc.; based on usageof a volume, such as the amount of data transferred to and/or from avolume (e.g., to reflect an amount of network bandwidth used), a numberof data access requests sent to a volume, a number of executing programsthat attach to and use a volume (whether sequentially or concurrently),etc.; based on the amount of data transferred to and/or from a snapshot,such as in a manner similar to that for volumes; etc. In addition, theprovided access may have various forms in various embodiments, such as aonetime purchase fee, an ongoing rental fee, and/or based on anotherongoing subscription basis. Furthermore, in at least some embodimentsand situations, a first group of one or more users may provide data toother users on a fee-based basis, such as to charge the other users forreceiving access to current volumes and/or historical snapshot volumecopies created by one or more users of the first group (e.g., byallowing them to make new volumes that are copies of volumes and/or ofsnapshot volume copies; by allowing them to use one or more createdvolumes; etc.), whether as a one-time purchase fee, an ongoing rentalfee, or on another ongoing subscription basis.

In some embodiments, one or more application programming interfaces(APIs) may be provided by the block data storage service, programexecution service and/or remote long-term storage service, such as toallow other programs to programmatically initiate various types ofoperations to be performed (e.g., as directed by users of the otherprograms). Such operations may allow some or all of the previouslydescribed types of functionality to be invoked, and include, but are notlimited to, the following types of operations: to create, delete,attach, detach, or describe volumes; to create, delete, copy or describesnapshots; to specify access rights or other metadata for volumes and/orsnapshots; to manage execution of programs; to provide payment to obtainother types of functionality; to obtain reports and other informationabout use of capabilities of one or more of the services and/or aboutfees paid or owed for such use; etc. The operations provided by the APImay be invoked by, for example, executing programs on host computingsystems of the program execution service and/or by computing systems ofcustomers or other users that are external to the one or moregeographical locations used by the

FIG. 1 illustrates an example network configuration 100 in whichmultiple computing systems are operable to execute various programs,applications, and/or services, and further operable to access reliablenon-local block data storage, such as under the control of a block datastorage service and/or program execution service, in accordance withvarious embodiments. In particular, in this example, a program executionservice manages the execution of programs on various host computingsystems located within a data center 102, and a block data storageservice uses multiple other server block data storage systems at thedata center to provide reliable non-local block data storage to thoseexecuting programs. Multiple remote archival storage systems external tothe data center may also be used to store additional copies of at leastsome portions of at least some block data storage volumes.

In this example, a data center 102 includes a number of racks 104, eachrack including a number of host computing devices 106, as well as anoptional rack support computing system 134 in this example embodiment.The host computing systems 106 on the illustrated rack 104 each host oneor more virtual machines 110 in this example, as well as a distinct NodeManager module 108 associated with the virtual machines on that hostcomputing system to manage those virtual machines. One or more otherhost computing systems 116 may also each host one or more virtualmachines 110 in this example. Each virtual machine 110 may act as anindependent resource node for executing one or more program copies (notshown) for a user (not shown), such as a customer of the programexecution service, or performing another such action or process or userdata requests, I/O operations, etc. In addition, this example datacenter 102 further includes additional host computing systems 114 thatdo not include distinct virtual machines, but may nonetheless each actas a resource node for one or more programs (not shown) being executedfor a user. In this example, a Node Manager module 112 executing on acomputing system (not shown) distinct from the host computing systems114 and 116 is associated with those host computing systems to managethe resource nodes provided by those host computing systems, such as ina manner similar to the Node Manager modules 108 for the host computingsystems 106. The rack support computing system 134 may provide variousutility services for other computing systems local to its rack 102(e.g., long-term program storage, metering, and other monitoring ofprogram execution and/or of non-local block data storage accessperformed by other computing systems local to the rack, etc.), as wellas possibly to other computing systems located in the data center. Eachcomputing system may also have one or more local attached storagedevices (not shown), such as to store local copies of programs and/ordata created by or otherwise used by the executing programs, as well asvarious other components.

In this example, an optional computing system 118 is also illustratedthat executes a PES System Manager module for the program executionservice to assist in managing the execution of programs on the resourcenodes provided by the host computing systems located within the datacenter (or optionally on computing systems located in one or more otherdata centers 128, or other remote computing systems 132 external to thedata center). As discussed in greater detail elsewhere, a PES SystemManager module may provide a variety of services in addition to managingexecution of programs, including the management of user accounts (e.g.,creation, deletion, billing, etc.); the registration, storage, anddistribution of programs to be executed; the collection and processingof performance and auditing data related to the execution of programs;the obtaining of payment from customers or other users for the executionof programs; etc. In some embodiments, the PES System Manager module maycoordinate with the Node Manager modules 108 and 112 to manage programexecution on resource nodes associated with the Node Manager modules,while in other embodiments the Node Manager modules may not assist inmanaging such execution of programs.

This example the data center 102 also includes a computing system 124that executes a Block Data Storage (“BDS”) system manager module for theblock data storage service to assist in managing the availability ofnon-local block data storage to programs executing on resource nodesprovided by the host computing systems located within the data center(or optionally on computing systems located in one or more other datacenters 128, or other remote computing systems 132 external to the datacenter). In particular, in this example, the data center 102 includes apool of multiple server block data storage systems 122, which each havelocal block storage for use in storing one or more volume copies 120.Access to the volume copies 120 is provided over the internal network(s)126 to programs executing on various resource nodes 110 and 114. Asdiscussed in greater detail elsewhere, a BDS System Manager module mayprovide a variety of services related to providing non-local block datastorage functionality, including the management of user accounts (e.g.,creation, deletion, billing, etc.); the creation, use and deletion ofblock data storage volumes and snapshot copies of those volumes; thecollection and processing of performance and auditing data related tothe use of block data storage volumes and snapshot copies of thosevolumes; the obtaining of payment from customers or other users for theuse of block data storage volumes and snapshot copies of those volumes;etc. In some embodiments, the BDS System Manager module may coordinatewith the Node Manager modules to manage use of volumes by programsexecuting on associated resource nodes, while in other embodiments theNode Manager modules may not be used to manage such volume use. Inaddition, in other embodiments, one or more BDS System Manager modulesmay be structured in other manners, such as to have multiple instancesof the BDS System Manager executing in a single data center (e.g., toshare the management of non-local block data storage by programsexecuting on the resource nodes provided by the host computing systemslocated within the data center), and/or such as to have at least some ofthe functionality of a BDS System Manager module being provided in adistributed manner by software executing on some or all of the serverblock data storage systems 122 (e.g., in a Peer to-peer manner, withoutany separate centralized BDS System Manager module on a computing system124).

In this example, the various host computing systems, server block datastorage systems, and computing systems are interconnected via one ormore internal networks 126 of the data center, which may include variousnetworking devices (e.g., routers, switches, gateways, etc.) that arenot shown. In addition, the internal networks 126 are connected to anexternal network 130 (e.g., the Internet or other public network) inthis example, and the data center 102 may further include one or moreoptional devices (not shown) at the interconnect between the data centerand an external network (e.g., network proxies, load balancers, networkaddress translation devices, etc.). In this example, the data center 102is connected via the external network 130 to one or more other datacenters 128 that each may include some or all of the computing systemsand storage systems illustrated with respect to data center 102, as wellas other remote computing systems 132 external to the data center. Theother computing systems 132 may be operated by various parties forvarious purposes, such as by the operator of the data center or thirdparties (e.g., customers of the program execution service and/or of theblock data storage service). In addition, one or more of the othercomputing systems may be archival storage systems (e.g., as part of aremote network-accessible storage service) with which the block datastorage service may interact, such as under control of one or morearchival manager modules (not shown) that execute on the one or moreother computing systems or instead on one or more computing systems ofthe data center, as described in greater detail elsewhere. Furthermore,while not illustrated here, in at least some embodiments, at least someof the server block data storage systems 122 may further beinterconnected with one or more other networks or other connectionmediums, such as a high-bandwidth connection over which the serverstorage systems 122 may share volume data (e.g., for purposes ofreplicating copies of volumes and/or maintaining consistency betweenprimary and mirror copies of volumes), with such a high-bandwidthconnection not being available to the various host computing systems inat least some such embodiments.

It will be appreciated that the example of FIG. 1 has been simplifiedfor the purposes of explanation, and that the number and organization ofhost computing systems, server block data storage systems and otherdevices may be much larger than what is depicted in FIG. 1. For example,as one illustrative embodiment, there may be approximately 4,000computing systems per data center, with at least some of those computingsystems being host computing systems that may each host fifteen virtualmachines, and/or with some of those computing systems being server blockdata storage systems that may each store several volume copies. If eachhosted virtual machine executes one program, then such a data center mayexecute as many as sixty thousand program copies at one time.Furthermore, hundreds or thousands (or more) volumes may be stored onthe server block data storage systems, depending on the number of serverstorage systems, size of the volumes, and number of mirror copies pervolume. It will be appreciated that in other embodiments, other numbersof computing systems, programs and volumes may be used.

FIG. 2 illustrates an example environment 200 including computingsystems suitable for managing the provision and use of reliablenon-local block data storage functionality to clients that can be usedin accordance with various embodiments. In this example, a managementsystem 202, such as one or more server computers including one or moreexternally-facing customer interfaces, is programmed to execute anembodiment of at least one BDS System Manager module 204 to manageprovisioning of non-local block data storage functionality to programsexecuting on host computing systems 208 and/or on at least some othercomputing systems 218, such as to block data storage volumes (not shown)provided by the server block data storage systems 220. Each of the hostcomputing systems 208 in this example also executes an embodiment of aNode Manager module 210 to manage access of programs 214 executing onthe host computing system to at least some of the non-local block datastorage volumes, such as in a coordinated manner with the BDS SystemManager module 204 over a network 216 (e.g., an internal network of adata center, not shown, that includes the computing systems 202, 208,220, and optionally at least some of the other computing systems 218).In other embodiments, some or all of the Node Manager modules 210 mayinstead manage one or more other computing systems (e.g., the othercomputing systems 218).

In addition, multiple server block data storage systems 220 areillustrated that each can store at least some of the non-local blockdata storage volumes (not shown) used by the executing programs 214,with access to those volumes also provided over the network 216 in thisexample. One or more of the server block data storage systems 220 mayalso each store a server software component (not shown) that managesoperation of one or more of the server block data storage systems, aswell as various information (not shown) about the data that is stored bythe server block data storage systems. Thus, in at least someembodiments, the server computing system 202 of FIG. 2 may correspond tothe computing system 124 of FIG. 1, one or more of the Node Managermodules 108 and 112 of FIG. 1 may correspond to the Node Manager modules210 of FIG. 2, and/or one or more of the server block data storagecomputing systems 220 of FIG. 2 may correspond to server block datastorage systems 122 of FIG. 1. In addition, in this example embodiment,multiple archival storage systems 222 are illustrated, which may storesnapshot copies and/or other copies of at least portions of at leastsome block data storage volumes stored on the server block data storagesystems 220. The archival storage systems 222 may also interact withsome or all of the computing systems 202, 208, and 220, and in someembodiments may be remote archival storage systems (e.g., of a remotestorage service, not shown) that interact with the computing systemsover one or more other external networks (not shown).

The other computing systems 218 may further include other proximate orremote computing systems of various types in at least some embodiments,including computing systems via which customers or other users of theblock data storage service interact with the management and/or hostsystems. Furthermore, one or more of the other computing systems 218 mayfurther execute a PES System Manager module to coordinate execution ofprograms on the host computing systems 208 and/or other host computingsystems 218, or the management system 202 or one of the otherillustrated computing systems may instead execute such a PES SystemManager module, although a PES System Manager module is not illustratedin this example.

In the illustrated embodiment, a Node Manager module 210 is executing inmemory in order to manage one or more other programs 214 executing inmemory on the computing system, such as on behalf of customers of theprogram execution service and/or block data storage service. In someembodiments, some or all of the computing systems 208 may host multiplevirtual machines, and if so, each of the executing programs 214 may bean entire virtual machine image (e.g., with an operating system and oneor more application programs) executing on a distinct hosted virtualmachine resource node. The Node Manager module 210 may similarly beexecuting on another hosted virtual machine, such as a privilegedvirtual machine monitor that manages the other hosted virtual machines.In other embodiments, the executing program copies 214 and the NodeManager module 210 may execute as distinct processes on a singleoperating system (not shown) executed on a single computing system 208.

The archival storage system 222 is operable to execute at least oneArchival Manager module 224 in order to manage operation of one or moreof the archival storage systems, such as on behalf of customers of theblock data storage service and/or of a distinct storage service thatprovides the archival storage systems. In other embodiments, theArchival Manager module(s) 224 may instead be executing on anothercomputing system, such as one of the other computing systems 218 or onthe management system 202 in conjunction with the BDS System Managermodule 204. In addition, while not illustrated here, in some embodimentsvarious information about the data that is stored by the archivalstorage systems 222 may be maintained in storage for the archivalstorage systems or elsewhere.

The BDS System Manager module 204 and Node Manager modules 210 may takevarious actions to manage the provisioning and/or use of reliablenon-local block data storage functionality to clients (e.g., executingprograms), as described in greater detail elsewhere. In this example,the BDS System Manager module 204 may maintain a database 206 thatincludes information about volumes stored on the server block datastorage systems 220 and/or on the archival storage systems 222 (e.g.,for use in managing the volumes), and may further store various otherinformation (not shown) about users or other aspects of the block datastorage service. In other embodiments, information about volumes may bestored in other manners, such as in a distributed manner by Node Managermodules 210 on their computing systems and/or by other computingsystems. In addition, in this example, each Node Manager module 210 on ahost computing system 208 may store information 212 about the currentvolumes attached to the host computing system and used by the executingprograms 214 on the host computing system, such as to coordinateinteractions with the server block data storage systems 220 that providethe primary copies of the volumes, and to determine how to switch to amirror copy of a volume if the primary volume copy becomes unavailable.While not illustrated here, each host computing system may furtherinclude a distinct logical local block data storage device interface foreach volume attached to the host computing system and used by a programexecuting on the computing system, which may further appear to theexecuting programs as being indistinguishable from one or more otherlocal physically attached storage devices that provide local storage.

An environment such as that illustrated with respect to FIGS. 1-2 can beused to provide and manage resources shared among various customers. Inone embodiment, a virtualized storage system can be provided using anumber of data servers, each having a number of storage devices (e.g.,storage disks) attached thereto. The storage system can expose thestorage to the customers as a Web service, for example. Customers thencan submit Web services requests, or other appropriate requests orcalls, to allocate storage on those servers and/or access that storagefrom the instances provisioned for those customers. In certainembodiments, a user is able to access the data volumes of these storagedevices as if those storage devices are conventional block devices.Since the data volumes will appear to the customer instances as if eachvolume is a disk drive or similar block device, the volumes can beaddressed with offsets, lengths, and other such conventional blockdevice aspects. Further, such a system can provide what will be referredto herein as “read after write” consistency, wherein data is guaranteedto be able to be read from the data as soon as the data is written toone of these data volumes. Such a system can provide relatively lowlatency, such as latencies less than about ten milliseconds. Such asystem thus in many ways functions as a traditional storage area network(SAN), but with improved performance and scalability.

Using a management system as illustrated in FIG. 2, for example, acustomer can make a Web service call into an appropriate API of a Webservice layer of the system to provision a data volume and attach thatvolume to a data instance for that customer. The management system canbe thought of as residing in a control plane, or control environment,with the data volumes and block storage devices residing in a separatedata plane, or data environment. In one example, a customer with atleast one provisioned instance can call a “CreateVolume” or similar API,via Web services, which enables the customer to specify the amountallows them to specify the amount of storage to be allocated, such as avalue between 1 GB and 1 TB, in 1 GB increments. Components of thecontrol plane, such as a BDS system manager module, can call into thedata plane to allocate the desired amount of storage from the availableresources, and can provide the customer with an identifier for the datavolume. In some embodiments, the customer then can call an“AttachVolume” or similar API, wherein the customer provides values forparameters such as an instance identifier, a volume identifier, and adevice name, depending on factors such as the operating system of theinstance, using a scheme that the operating system provides for harddrives and similar storage devices, as from inside the instance there isno apparent difference, from at least a functionality and naming pointof view, from a physical hard drive. Once the customer has attached thedata volume to a provisioned instance, the customer can perform variousfunctionality, such as to build a file system, use as raw storage for adata system, or any other such activity that would normally be performedwith a conventional storage device. When the customer no longer requiresthe data volume, or for any other appropriate reason, the customer cancall a “DetatchVolume” or similar API, which can cause the associationof the instance to that volume to be removed. In some embodiments, thecustomer can then attach a new instance or perform any of a number ofother such activities. Since the data volume will fail independently ofthe instances in some embodiments, the customer can attach a volume to anew instance if a currently associated instance fails.

In certain approaches, a customer requesting a data volume is not ableto select or request a particular type of volume, or a particular typeof performance. A customer is typically granted an amount of storage,and the performance follows a “best effort” type of approach, whereincustomer requests are performed based on the capability, load, and othersuch factors of the system at the time of the request. Each customer istypically charged the same amount per unit measure, such as the samedollar amount per gigabyte of storage per month, as well as the sameamount per number of I/O requests per month, charged in an amount suchas in increments of millions of requests per month.

A PES or similar system or service enable customers to ensure a minimumlevel of performance by enabling each customer to specify one or morecommitted rates or other performance guarantees. In addition to aminimum amount of storage, each customer can purchase a committed rateof operations, such as a specific rate of input/output (I/O) operations.In previous systems, performance guarantees were obtained by dedicatingan entire machine to a customer, along with dedicated bandwidth, etc.,which often is overkill. Embodiments discussed herein can allowcustomers to purchase performance guarantees at any appropriate level ofgranularity. By managing the performance allocations for customers onvarious resources, systems and methods in accordance with variousembodiments can enable customers to purchase volumes that have an IOPSguarantee at any appropriate level, for example, such as between 1 IOPSand 5,000 TOPS. By allocating portions of disks, spindles, and othersuch resources, a system can offer customers guaranteed levels ofstorage and/or I/O operations rates.

Such a system or service can also enable users to share resources,providing specific guarantees or commitments with respect to thoseresources at a level of granularity that is not possible withconventional solutions. In many cases, customers may wish to specify aminimum processing rate, such as a minimum rate of I/O operations.Approaches in accordance with various embodiments can commit the desiredamount of server, storage, and/or other resources necessary to provideat least a committed level of performance By committing to a level ofperformance, a customer can receive a consistent quality of servicelevel that is not affected by the performance of other customers sharinga device or resource. Even in an overload situation, the customer canreceive at least the guaranteed level of service. The amount ofguaranteed service can depend upon various factors, as well as theamount specified and paid for by the customer.

For example, FIG. 3 illustrates an example distribution 300 wherein theprocessing capacity of a server 302 is allocated among severalcustomers. In this example, the server is determined to have a capacityfor about 500 IOPS. This value can be an estimated or average value, andcan be determined or adjusted over time based on monitored performanceor other such information. While all 500 IOPS can be allocated in someembodiments, it can be desirable in other embodiments to only allocate athreshold amount, percentage, or other portion of the total capacity asguarantees. Since the processing time for each request can vary, thenumber of IOPS at any given time can vary as well, such that allocatingall 500 IOPS might cause short periods of time where the customers areunable to receive their guarantees when the actual performance is on theorder of 450 IOPS due to the nature of the requests being processed,etc.

In this example, the system might be able to allocate up to 400 of the500 IOPS available for the server 302. As can be seen, Customer A hasbeen allocated a committed 200 IOPS, Customer B has been allocated acommitted 100 IOPS, and Customer C has been allocated a committed 55TOPS. The remaining customers on the server then can utilize a “bestperformance” or similar approach sharing the remaining 145 TOPS (onaverage). The number of customers sharing the remaining IOPS can beselected or limited based upon a number of factors, such that theremaining customers can still obtain a desirable level of performance alarge percentage of the time.

In many cases, however, Customers A, B, and C will not all utilize theirentire committed capacity. Each of those customers might pay toguarantee a level of performance such that the level is available whenneeded, but often will not actually be running near that peak capacity.In this situation, the remaining Customers D-Z can actually share morethan the remaining 145 IOPS, or “remnants,” as those customers canutilize available capacity from the committed TOPS that are not beingcurrently used. This provides another advantage, as customers canreceive guaranteed levels of performance, but when those levels are notbeing fully utilized the remaining capacity can be used to service othercustomer requests. Such an approach enables the regular customers(without guarantees) to receive improved performance, without the needfor the provider to purchase excess capacity or provide capacity that isnot being utilized a vast majority of the time.

In some embodiments, any of Customers A-C can exceed their performanceguarantees. For example, Customer A might, for a period of time, submitrequests on the order of 250 IOPS. For the 50 IOPS above the committedrate, those requests in some embodiments can be treated as normalrequests and processed at the same performance level as those ofcustomers D-Z. In an overload situation, any throttling, slow down, orother reduction in processing can then be applied to the 145 or so TOPSthat are not subject to guarantees. The guaranteed levels for CustomersA, B, and C will not be affected, as the overflow adjustments are madeto the non-committed portion. Accordingly, customers with non-guaranteedlevels of service can be charged lower prices per request, period, etc.

In other embodiments, when any of Customers A-C exceed its performanceguarantees, that customer can receive a “blended” or other level ofservice. In a situation where each request for a customer is treatedindividually or without context, such that any single request over acommitted rate can be treated as a request without a committed rate,there can be a negative impact on the other requests for that customer.For example, if Customer A has a committed rate of 250 TOPS and at onepoint issues 251 requests in a second, that single request over the ratecommitment can be processed much more slowly than the other requests,such as at 20 ms instead of 1 ms. If the customer application isexpecting a performance level of about 1 ms and experiences a slowdownwith respect to one request, that can have an impact on the fulfillingof the other requests as well, and can cause a significant slowdown orother problems for the application even though the customer onlyslightly exceeded the threshold for a short period of time.

A PES Manager can address such a situation by providing a “boost” orblended rate to customers with rate guarantees who exceed thoseguarantees, which provides a level of service between a committed anduncommitted rate. For example, a customer with a rate guarantee mighthave any excess requests placed at or near the front of the “queue” foruncommitted requests. In other embodiments, the customer might receive alower rate commitment for those requests, such as might experience adelay of about 5 ms, which are not processed at the same rate asrequests within the committed rate, but are processed more quickly thanfor customers without a committed rate. The amount of delay can berelated in some embodiments to the amount of overage and the length oftime that the customer is over the guaranteed rate, to provide arelatively uniform degradation in performance that is at least somewhatproportional to the amount of overage. For example, a customer with aguaranteed rate of 100 IOPS who is consistently sending requests at arate of 500 per second would likely not receive as much of a boost as acustomer with a 250 IOPS guaranteed rate who occasionally goes over by ahandful of requests. In some embodiments, a customer can be providedwith the same rate for any overage, but can be charged a premium foreach such request. Many other variations are possible as well within thescope of the various embodiments.

To manage the commitments, components of a control plane can essentiallymake reservations against specific servers or other resources in thedata plane. In FIG. 3 where three customers want a total of 355 TOPScommitted, the control plane can reserve that level against a singleserver, for example, and allocate the remainder to any other customerprovisioned on that server. The control plane can also ensure that morevolumes are not allocated to a server than the server can handle, due tospace limitations, the number of I/Os that need to be generated, or anyother such factor.

In some cases, a customer might want a guaranteed level of service thatexceeds the “committable” capacity for a given resource. For example, inFIG. 3 it was stated that the server could allocate 400 TOPS, but 355are already allocated to Customers A-C. If another customer wants 300TOPS, that number would exceed the allowed amount (as well as theaverage capacity) of the server. Thus, the customer cannot receive thedesired commitment on that server. Using the management components ofthe control plane, however, the commitment rate can be allocated acrossmultiple servers. For example, in the allocation 400 of FIG. 4, it isshown that Customer A sends a request from a user device 402 requestinga guarantee of 300 IOPS. The control plane in some embodiments cansearch the available servers to determine if a server is available with300 IOPS left for guarantees. If not, the control plane can attempt tospread the IOPS across as few servers as possible. In this case, thecontrol plane determines to allocate the TOPS guarantee across threeservers, with a first server 404 providing a guarantee of 100 TOPS, asecond server 406 providing a guarantee of 125 TOPS, and a third server408 providing a guarantee of 75 TOPS. Thus, a volume does not need to beresident on a single server as in many conventional systems, but can bepartitioned across multiple servers. The allocation across multipleservers also enables customers to utilize larger data volumes, such asvolumes of 50 terabytes instead of 1 terabyte, as the data can be spreadacross multiple servers. In such an embodiment, a customer can purchasebetween 1 GB and 50 TB of storage, for example, with a desiredcommitment rate, such as a rate between 0 IOPS and 5,000 IOPS. Based onone or more of these values selected by a customer, the control planecan determine an appropriate, if not optimal, way to provide thoseguarantees using available resources in the data plane.

In some embodiments, the committed rate might be allocated up to 100% ofthe capacity of a server. An amount of un-committed usage can bepredicted and/or monitored, such that a number of customers can beallocated to resources that are fully committed, as long as the customeris willing to take resources only as they come available. Certaincustomers might not care when IOPS occur, particularly for certainwrites, such that they would be willing to pay a lower rate to utilizeresources that are guaranteed up to 100%, knowing that some customerslikely will not utilize their full guaranteed levels. Such an approachassists the provider in maximizing the utilization of each resource byallocating un-committed IOPS on resources that are otherwise “fully”committed.

Further, different types of customers will have different requirements.For example, if a disk has 100 TB of space and 100 IOPS capacity, afirst customer might want to store 90 TB of vacation photos that arerarely accessed. That customer might be interested in purchasing 90 TBof storage space along with an uncommitted rate of I/O operations.Another user might want a 1 TB database that is going to be underconstant use, such that the user might want about 100 IOPS. In thisexample, the first customer could be sold 90% of the for storage, andthe other customer can be allocated 90% (or more) of the I/O operationcapacity of the disk as a commitment. Due to the nature of thecustomers, they both could be provisioned on the same disk, whereotherwise each might have required a dedicated disk.

Enabling others to utilize the unused portion of a customer's committedallocation can benefit that customer as well, because the customer maynot have to pay for the entire allocation and thus can receive a lowercost that would be required for a dedicated resource. Further, thecustomer will still receive the guaranteed level of service. When thecustomer is at the full committed level, other customers on that devicewill have to reduce their rate of request or wait longer per request. Insome embodiments, a resource can be fully committed and other users canstill be provisioned on the device to utilize the unused portions of theresource. In some cases, where predictions and monitoring accuratelysupport such use, a resource can even be committed for over 100%, wherethe actual use by the allocated customers will almost never equal orsurpass 100% usage. In such an embodiment, there can be other resourcesthat can pick up any overage in the event of an unlikely event where theresource is overloaded.

In order to make commitments on a new resource (or new instance of aresource), certain default information can be used to make commitments.It can be desirable to use relatively conservative numbers as thedefaults, in order to prevent over-committing a resource. For example, acontrol plane component can use general default information that eachspindle of a particular type can handle 100-120 TOPS. If there aretwelve spindles per server, there can be about 1200-1440 IOPS availableper server. The control plane components can be conservative, initially,and can allocate a first amount, such as up to 400 IOPS, until moreinformation is gained about the performance and usage of that resource.In certain examples customer utilization is about 10%, such that in manyinstances customers are using only 10% of the available IOPS. Thus,dedicating 40% to guaranteed IOPS would still be four times more than isactually being used, and thus likely is still a conservative number.Each server in the data plane can track the amount of available space onthe server, and can store the number of TOPS that are committed for thatserver. Thus, when a new volume is to be created, the control planecomponents can determine a server that, out of that 400 TOPS, has enoughcapacity available that the server is willing to commit for that volume.An approach in one embodiment is to ask servers, at random or in aparticular order, whether they can take a specific number of IOPS, andthis continues until a server is located that can accept the IOPS. Whenthe information is also stored in the control plane, however, thecontrol plane can select an appropriate server first and then contactthat server to take the volume.

In many situations, however, a user will not utilize the throughput (orother functional aspects) provided by a guarantee such as thosedescribed above. As illustrated in FIG. 3, the user might have aguaranteed available rate of I/O operations provided by three differentservers, but during normal use might only use a rate of operations thatcould be provided by one of those servers, or a portion of each server.In such an example, the user might prefer not to have to pay for theguarantees, or dedicated rate of I/O operations, at all times. The usermight be willing to instead pay for a certain amount of dedicatedcapacity, such as a dedicated rate of 125 TOPS that are always dedicatedto the user. For the other 175 IOPS that the user only usesoccasionally, however, the user might be willing to reserve capacitythat can enable other users to utilize that capacity while the capacityis not being used, in order to help spread the cost of the capacity tothose other users. In other cases, the user might want to only pay forthose requests that exceed the dedicated (or reserved) capacity. Thus,the user might prefer to get priority for those requests over requestsfrom average users, but might not want to be charged for more capacitythan is actually being used. The user in many embodiments can achievethis by submitting a bid price along with the request, which will causethat request to receive priority treatment if that bid exceeds thecurrent market price and/or exceeds any other pending bid for the sametype of capacity. Such a process can be complex for a large number ofusers with different types of requests and requirements.

Systems and methods in accordance with various embodiments, such as thesystems described with respect to FIGS. 1 and 2, can be used to managethese and other functional aspects of one or more types of sharedresource, in order to provide flexibility and management of the way inwhich those shared resources are utilized. Shared resources can providestorage and/or processing capacity, with various levels of throughput,bandwidth, latency, and other such aspects. In one example, a number ofcustomers interact with at least one PES Manager module (or other suchmodule, process, or component) to process various types of requests,execute programs, or otherwise access resources on one or more resourcenodes, with the PES Manager module providing some or all of thefunctionality of a particular program execution service. The customerscan include different types of customers, including customers withdedicated rates or guarantees of at least one functional aspect orresource capacity, such as may include storage capacity, computingcapacity, storage and/or network bandwidth, throughput, and/or latency.There also can be customers who utilize excess resource capacity,customers who utilize on-demand variable capacity, and other types ofcustomers and/or other users.

The resource nodes can be provided for use in executing instructions orfulfilling requests on behalf of the users, and in some embodiments mayinclude multiple physical computing systems, virtual machines, storageinstances, or other such resources that are hosted on one or morephysical systems. Each of the resource nodes has some amount ofresources available that provide a specific amount of resource capacity,such as may be measured, for example, by a combination of one or more ofprocessing capacity (e.g., number and/or size of processing units),memory capacity, storage capacity, bandwidth capacity, latency capacity,etc. In some embodiments, the PES provider may provide preconfiguredresource nodes, with each pre-configured resource node having similarand/or equivalent amounts of resources available to users, while inother embodiments the PES provider may provide a selection of variousdifferent resource nodes from which a user may choose, or that mightotherwise be assigned to one or more users. In some embodiments, theresources can be offered as individual components which the user canutilize independently of any other resource. In other embodiments,resources can be offered in packages, groups, or other suchcombinations. In one example, a user might make a request for a systemthat includes many resource types, each of which may have associatedcapacity requirements. If at least one of those requirements cannot bemet, some embodiments will reject the whole system request while otherembodiments can allow the user to obtain those types where therequirements can be met, or ask whether lesser requirements can be usedfor certain resource types. In other cases, a user can obtain one typeof resource, such as an amount of storage, independent or separate fromanother type of resource, such as an amount of compute capacity.

In at least some embodiments, fees are associated with the use of a PES,such that the PES may process requests on behalf of a user in exchangefor payment of one or more fees by that user. For example, in someembodiments, fees may be charged to a user based on an amount and/ortype of resource capacity allocated for a user, such as may be based onone or more of a number of processing units, an amount of memory, anamount of storage, an amount of network resources, etc., allocated tothe user. In some embodiments, fees may be based on other factors, suchas various characteristics of the resources used, such as, for example,based on CPU capabilities or performance, platform type (e.g., 32-bit,64-bit, etc.), storage type (e.g., disk or flash), etc. In someembodiments, fees may be charged on the basis of a variety of usefactors, such as a price per use of the service, a price per unit oftime that computing services are used, a price per storage used, a priceper amount of data transferred in and/or out, etc. In at least someembodiments, as discussed in more detail below, fees may be based onvarious other factors, such as related to availability of the programexecution capacity (e.g., varying degrees of availability, such asguaranteed availability and/or variable availability) and/or variousproperties related to executing programs (e.g., continuity of execution,fault tolerance, etc.). In at least some embodiments, a provider of aPES may offer one or more of various tiers, types and/or levels ofservices or functionality for executing programs on behalf of multipleusers, and in some such embodiments, various fees may be associated withthe various tiers, types and/or levels of services. For example, in someembodiments, a user may be charged one or more fees in conjunction withuse of dedicated resource capacity and/or functionality provided by aPES, such as fees that are respectively lower than fees associated withcomparable use of an on-demand variable program execution capacityservice of the PES. The lower fees may reflect, for example, the userentering into a long-term agreement for a specified use time period(e.g., a number of weeks, months, years, etc.), such as to pay one ormore specific rates over the term of the agreement (e.g., up frontand/or periodically). In addition, for example, tiers may be used for aspecific type of functionality provided by a PES, such as to charge feesat a first tier for a first quantity of dedicated resource capacityfunctionality (e.g., up to a specified first threshold of resource nodesbeing used), to charge fees at a second tier (e.g., a lower price tier)for a second quantity of dedicated resource capacity functionality(e.g., above the specified first threshold and up to a specified secondthreshold of resource nodes being used), etc. Tiers may further be basedon various factors other than quantity of functionality that is used inat least some embodiments, whether instead of or in addition to beingbased on quantity of functionality used. Additional details related tovarious fees associated with a program execution service are included inpending U.S. patent application Ser. No. 11/963,331, filed Dec. 21, 2007and entitled “Providing Configurable Pricing for Execution of SoftwareImages,” which is hereby incorporated by reference in its entirety.

A use time window for a period of dedicated or reserved resourcecapacity may be specified in various manners in various embodiments,such as to indicate a specified period of time in which a user hasaccess to dedicated program execution capacity (e.g., a number of days,weeks, months, years, etc.), a duration of time in which one or moreprograms may be continuously executed for a user (e.g., a number ofhours the one or more programs may execute within any given period, suchas an hour a day, an hour a week, etc.), a window of time in which oneor more programs may execute (e.g., between 1:00 p.m. and 3:00 p.m.every other day), etc. As previously noted, in some embodiments anelectronic marketplace may be provided for users of a PES, such thatdedicated capacity users may provide some or all of their specified usetime period for dedicated capacity to one or more other users inexchange for payment from those one or more other users, such that theone or more other users may use the provided portions of dedicatedcapacity to process requests and/or fulfill various types of operationson behalf of the one or more other users, and the dedicated capacityuser may receive payment for such use. In other embodiments, a dedicatedcapacity user may temporarily provide use of some portion of thededicated capacity for use by one or more users based in part on the oneor more other users having an urgent need of the capacity, such as maybe indicated by a willingness of the one or more users to pay a premiumfor use of the dedicated capacity (e.g., a rate greater than that paidby the dedicated capacity user), and in at least some such embodiments aportion and/or all of the fees collected from the one or more users maybe provided to the dedicated capacity user.

A variable capacity user can interact with the PES Manager to configureand/or submit a control plane request specifying on-demand variableresource capacity, such as by submitting an instance request forimmediate creation of a resource instance and/or providing informationfor later such creation. After a request for immediate execution isreceived, the PES Manager can determine whether there is a sufficientamount of resource capacity to satisfy the request, and if so the PESManager can initiate the creation of the instance (or perform anothersuch action). In cases where a user schedules an instance request forone or more future times, the PES Manger may attempt to reserve anappropriate amount of resource capacity for launching those instances atthe one or more future times, and/or may delay the determination ofwhich resources to use until a later time (e.g., such as when the one ormore future times occur).

If the PES Manager is unable to allocate resource capacity forfulfilling a variable capacity user instance request, the request mayfail, such that the request is not processed. In such cases, the usermay resubmit a failed request for later fulfillment. As previouslynoted, in some embodiments a variable capacity user may be chargedvarious fees in association with use of the PES, such as based on anamount or type of capacity used, a duration of time the capacity isused, etc. In addition, while not illustrated, some portion of theshared resources may be specified to provide the on-demand variablecapacity, while in other embodiments the on-demand variable capacity maybe provided in other manners (e.g., using all of the resource instances;using all of the resource instances that are not allocated for anotherpurpose, such as for dedicated capacity; etc.).

In addition, a portion of the shared resources can be allocated for useby one or more dedicated capacity users, such that each of the dedicatedcapacity users can have priority access to capacity on at least someportion of those resources. For example, each dedicated capacity usermay have one or more resource nodes dedicated for launching instancesand/or fulfilling operations of that user during a specified use timeperiod, such that the user may access the one or more resource nodes atany time during the specified use period on behalf of the user and/ormay continuously utilize the one or more resource nodes for the durationof the specified period. As one specific example, one or more of thededicated capacity users may enter into a long-term (e.g., 1 year term)agreement with the PES provider, such that each of those users haspriority access to a dedicated amount of resource capacity over the termof the agreement in exchange for a fixed fee payment (e.g., upfront orperiodically billed) and, in some cases, other use fees (e.g., variablefees associated with use of various resources, such as electricity,physical rack space, network utilization, etc.).

After a dedicated capacity user interacts with the PES Manager to obtainpriority use of a dedicated resource capacity, the PES Manager mayallocate one or more resource instances (e.g., resource nodes) fordedicated use by the user. In some embodiments, resource capacity isallocated for priority use by an associated specific dedicated capacityuser for an entire use period. In other embodiments, rather thanallocate specific resource capacity to specific dedicated users for anentire use period, the PES Manager instead allocates capacity from adedicated group of resources, such that an appropriate amount ofcapacity to satisfy the requests from the various dedicated capacityusers is available in the dedicated resource group. In some suchembodiments, after an instance request is received for a dedicated useron one or more dedicated resources, an appropriate amount of capacitymay be selected from the dedicated resource group at substantially thetime of the received instance request. After the selected amount ofresources is no longer needed for the dedicated user (e.g., aftertermination and/or completion of the request), those resource instancesmay be returned to the dedicated resource group for use by otherdedicated capacity users, and in some embodiments may further be trackedas being available for use as part of a private pool of excess resourcecapacity for that dedicated user, as discussed below. In addition, aftera use period for a particular dedicated capacity user expires, the oneor more resource instances allocated for use by that user may similarlybe released for use by others, such as by, for example, making theresource instances available to be allocated for use by one or moreother (e.g., new) dedicated resource capacity users. In addition, thePES Manager may perform one or more various other management operationswith respect to fulfilling instance requests, such as, for example,enforcing use periods and/or other restrictions associated with requestsand/or users submitting requests, freeing-up resources to fulfill therequests, authorizing and/or authenticating the requests and/or therequesting users, etc. In some embodiments, a delay may be incurredbetween a time that a request on dedicated resource capacity and a timethat the request is fulfilled, such as a delay period for performingvarious of the management operations, etc. In various other embodiments,resources for dedicated capacity users may be allocated, tracked,reserved and/or released using various other techniques.

In addition, multiple excess capacity users can interact with the PESManager to configure and/or submit instance requests to be fulfilledusing excess resource capacity of the PES. Such excess capacity usersmay include users who use private excess capacity pools and/or one ormore general excess capacity pools. As previously noted, excess resourcecapacity may include excess and/or unused resource capacity (e.g.,processing capacity, storage capacity, throughput, bandwidth, latency,etc.) that may be otherwise allocated for other uses, and in someembodiments may be separated into at least one general excess capacitypool that includes the excess resource capacity that is not in use aspart of one or more other private excess capacity pools. For example,excess resource capacity may include a number of resource instances(e.g., resource nodes) that are otherwise allocated for other purposes(e.g., for use by dedicated capacity users, variable capacity users,and/or other users), but are not currently being used for thosepurposes. The excess capacity users may configure instance requests tobe fulfilled in various ways, such as by specifying a number and/or typeof resource instances to be used, a minimum and/or maximum number ofresource instances to use, an expiration time for the fulfillment, apreferred time and/or period of fulfillment, one or more bids forpayment of use of excess resource capacity (e.g., a bid per each use ofa resource instance, a bid per use of a resource per some unit of time,a minimum and/or maximum bid, etc), etc.

A PES Manager (or similar module or component) can determine when toinclude and/or remove one or more resource instances from excessresource capacity that is available for use by excess capacity users,when to initiate and/or terminate fulfillment of instance requests forexcess capacity users, and which resource instances to use to processthe requests for excess capacity users. In addition, a PES Manager mayfurther track how much excess resource capacity is available for eachexcess capacity user in private excess capacity pools for those users,such as for some or all excess capacity users that are also dedicatedcapacity users. In various embodiments, the PES Manager may determinethat one or more resource instances are unused and/or or otherwiseavailable for use by excess capacity users in various ways. For example,the PES Manager may receive indications from various users and/orentities that one or more resource instances are not being used or areotherwise available for use by excess capacity users, such asindications from one or more dedicated capacity users that they are notusing some number and/or portion of the resource instances dedicated foruse by those users. In some such embodiments, the dedicated capacityusers may indicate one or more times at which dedicated resourceinstances are likely to be (or are) committed by the dedicated capacityusers to be unused and/or available (e.g., particular times of day,particular days, periods of time, etc.). In addition, one or more otherusers may interact in similar manners to indicate that one or moreresource instances, such as one or more resource nodes under the controlof the one or more other users (e.g., third party computing systems, notshown), are available for use by excess capacity users.

In some embodiments, the PES Manager may automatically determine whenresource instances are available for excess capacity users, such as bymonitoring some or all of the instances and/or by tracking usagepatterns of one or more users of the instances. In some such cases,determining whether resource instances are unused or otherwiseunderutilized may include determining and/or predicting a likelihoodthat the instances will remain unused for at least a period of timesufficient to process requests of one or more excess capacity users,such as may be based on an analysis of past usage patterns of one ormore users. In various embodiments, a period of time sufficient toprocess instance requests of one or more excess capacity users may bebased on one or more considerations, such as a time to stop/startfulfillment on behalf of users, a time to configure resources for use, atype of instance request (i.e., some types of request may perform usefulamounts of work in short periods of time, such as various types of dataprocessing, etc., while other requests use longer periods of time beforeuseful results are produced), etc.

After it is determined that one or more resource instances are availablefor use by one or more excess capacity users, the instances can be addedto a general pool of available excess resource capacity and/or otherwisetracked as being part of one or more private excess capacity pools, suchthat the instances may be used by the PES Manager for processingrequests on behalf of corresponding excess capacity users until suchtime that other uses of the resource instances arise (e.g., priorityusage by dedicated capacity users, variable capacity users, etc.). ThePES Manager may further determine that one or more of the excesscapacity resource instances is no longer available for use by excesscapacity users. For example, the PES Manager may receive indicationsthat one or more resource instances is no longer available, such as maybe based at least in part upon explicit requests to stop use of theresource instances from a user that controls those instances, instancerequests from priority users on the one or more instances, an expirationof a specified period of availability, etc. As another example, the PESManager may automatically determine other uses for the resourceinstances, such as may be based upon received requests from one or moreusers that correspond to the other uses, or based on determining alikely demand for one or more resource instances (e.g., based ondetecting an increased usage of other requests or processes for whichthe resources may be used, etc.).

In some embodiments, an excess capacity user may interact with the PESManager to request immediate fulfillment of one or more launch requestson a specified number of excess resource instances and/or to schedulesuch fulfillment at one or more future times, such that the PES Managermay initiate the requested fulfillment on the specified number of excessresource instances if it is determined that the specified number ofexcess instances are available at the time of the requested fulfillment.The determination of whether the specified number of excess instances isavailable at the time may include first considering whether a privateexcess capacity pool (if any) for the user includes the specified numberof excess resource instances, and selecting those excess instances foruse if they are available. If only a subset of the specified number ofexcess instances is available in a private excess capacity pool for theuser, the PES Manager may in some embodiments select those privateexcess instances to use in partially fulfilling the request, and attemptto obtain the remaining excess resource instances from the generalexcess capacity pool, or instead may proceed in other manners (e.g.,fulfilling the request using only the subset of available private excessresources; indicating that the request fails because the private excesscapacity pool does not include all of the specified number of excessinstances; attempting to fulfill the request using only excess instancesfrom the general excess capacity pool; etc.). In addition, an excesscapacity user may interact with the PES Manager to configure one or morerequests to be processed on a specified number of excess resourceinstances to be performed as such excess instances become available,such as during an indicated future period of time, and in some suchembodiments the PES Manager may initiate the requested processing on thespecified number of excess instances when the manager determines thatthe specified number of excess resource instances is available duringthat period of time. In some embodiments, an excess capacity user mayspecify a minimum and/or maximum number of excess resource instances touse for processing a request, such that the requested processing isinitiated if the PES Manager determines that at least the minimum numberof excess resource instances is available (whether from a private excesscapacity pool and/or a general excess capacity pool), and the PESManager may initiate the requested processing on up to the maximum (ifspecified) number of excess resource instances for the request based onavailability of the excess resource instances.

After an instance request from an excess capacity user is received, thePES Manager may select which available resource instance to use for theinstance request if the manager determines that there is an appropriatenumber of resource instances with sufficient resource capacity toprocess the instance request, whether from a private excess capacitypool and/or a general excess capacity pool. For example, the PES Managermay randomly select an appropriate number of excess resource instancesfrom a pool of available resource instances. In other embodiments,instances may be selected on the basis of one or more other factors,such as, a predicted length and/or likelihood of continued availabilityof the resource instances, a physical proximity of the specific resourceinstances to one or more other resource instances, a geographic locationof the one or more resources, etc. Furthermore, if one or more resourceinstances have been dedicated for use by a particular user, thoseparticular instances may be the only ones used as part of a privateexcess capacity pool for that particular user.

As previously noted, handling of instance requests for excess capacityusers on excess resources may be temporary, such that the PES Managermay automatically terminate instances when other preferred uses for theexcess resources arise. In such cases, the instances may beautomatically terminated (e.g., aborted, shut down, hibernated, etc.),such that the resource nodes are free for other purposes and no longeravailable for excess capacity users. In addition, as discussed ingreater detail elsewhere herein, a processing state of those instancerequests may be saved before the processing is terminated, such as toenable a later restart of the user instances. Furthermore, there may bemultiple excess resource instances currently processing requests onbehalf of excess capacity users that may be capable of satisfying thenumber of resource instances for the other purposes, and in such casesthe PES Manager may determine which of the excess resource nodes to freefor the other purposes based on various factors (e.g., by firstreclaiming excess capacity instances from a private excess capacity poolof a user for use in fulfilling a request from that user for dedicatedcapacity use; or by using a determined priority among the currentrequests of the excess capacity users, such as based on time submitted,bid prices, etc.). In some embodiments, at least some of the terminatedrequests may have their fulfillment migrated and/or re-initiated on oneor more other available excess resource instances (if any), such asimmediately or at a later time. In some such cases, if there are notenough excess resource instances available to satisfy all of the currentexcess capacity users who have requests for processing, the PES Managermay determine to terminate fulfillment of one or more additionalinstance requests on one or more other excess resource instances suchthat the one or more instance requests initially terminated on thespecific resource instances may be reinitiated on the newly freed excessinstances. After the PES manger automatically terminates processing of arequest for a user, the PES Manager may automatically re-initiate theinstances for the launch requests as excess resource instances becomeavailable. At least some terminated requests may be automaticallymigrated and/or reinitiated on one or more other computing systemsand/or program execution services with sufficient resources available tofulfill the requests, including one or more resources available viavariable capacity functionality provided to variable capacity users,dedicated capacity functionality provided to dedicated capacity users,and/or one or more third-party computing systems (not shown) external tothe PES.

In addition, various types of instance requests may be better suited forprocessing in such a temporary environment as provided by excesscapacity pools, such as instance requests that are relatively tolerantof unexpected interruptions due to occasional termination. In someembodiments, excess capacity users may submit one or more instancerequests on the excess capacity that are designed to intermittentlytrack and/or save progress (e.g., intermediate results, current runtimestate, etc.), so that the handling may be re-initiated at a future time.In addition, when the PES Manager automatically terminates instances onexcess resource instances, the PES Manager may automatically detect thecurrent system and/or state, such that the detected state may be savedand/or provided to a respective excess capacity user such thatfulfillment can be resumed in a similar state. Alternatively, if aparticular executing program corresponding to the request is able tosave its own execution state, the PES Manager may instead notify theprogram to perform its own execution state save before terminating theinstance.

As previously noted, a user having submitted a reservation request andreceived a private excess capacity pool can receive priority access toany resource capacity in that private excess capacity pool. If, however,the private excess capacity pool includes excess capacity that is notcurrently in use by the associated user for the private pool, thatcurrently available excess capacity may be made temporarily availablevia a general excess capacity pool to other users. In addition, accessto excess resource capacity from a general excess capacity pool forprocessing requests on behalf of multiple excess capacity users may bebased on priority among the excess capacity users in at least someembodiments, such that if there is contention for some amount of theexcess resource capacity between two or more requests, the requesthaving a higher associated priority will be provided access to use thecontended amount of excess resource capacity.

In at least one illustrative embodiment, an instance request with thehighest bid amount (e.g., a highest maximum bid) is given priority overinstance requests with lower bids, with ties between bid amounts able tobe resolved based at least in part upon other factors (e.g., whichrequest was received first). In some embodiments, one or more of theexcess capacity users may interact with an embodiment of the PES Managerto bid (e.g., auction-style) on access to available excess resourcecapacity (e.g., currently available and/or available at one or morefuture times) of the general excess capacity pool, such that the userwith the winning bid may receive the access to the available excessresource capacity. In some embodiments, the PES Manager canautomatically terminate fulfillment of lower priority instance requeststhat are currently being fulfilled using excess resource capacity infavor of processing higher priority instance requests using the excessresource capacity. In some embodiments, such as non-priority basedembodiments, the PES Manager can instead occasionally rotate throughpending instance requests to fulfill using excess resource capacity,such that each instance request may be provided some amount ofprocessing time.

In some embodiments, the PES Manager may provide interactive feedback toone or more of the excess capacity users that are interacting with thePES Manager to configure and/or request fulfillment using an amount ofexcess resource capacity of a general excess capacity pool. For example,interactive feedback may include indications of when and/or for how longinstance requests may require indicated amount of excess resourcecapacity, as may be based at least in part upon current and/or predicteddemand or usage. In one illustrative embodiment, the PES Manager mayindicate suggested bid levels along with corresponding informationindicating when processing will likely occur and/or complete, such thatthe excess capacity user may reconfigure (e.g., by specifying adifferent bid, a different amount and/or type of resource capacity, adifferent duration of processing, etc.) a request to meet the desires ofthe excess capacity user, such that the request may be processed at anearlier time, etc.

FIG. 5 illustrates one example process 500 for processing an instancerequest using some of the approaches discussed above. It should beunderstood for the various processes described herein, however, thatadditional, fewer, or alternative steps can be performed in similar oralternative orders, or in parallel, within the scope of the variousembodiments unless otherwise stated. In this example, an instancerequest is received that is associated with a user 502. The associationwith the user can be determined through any appropriate technique, suchas by determining an originating IP address of the request ordetermining a session identifier associated with the request. The typeof instance request also can be determined 504. It should be understoodthat various determinations of the type of request can be performed aspart of a single determination in various embodiments. As used in thisexample, the “type” of the instance request will correspond to whetherthe instance request is to be fulfilled using dedicated, excess, orvariable capacity as discussed elsewhere herein, although other types ofinstance requests can be used as well. In at least some embodiments, areservation request could have been previously received from the user inorder to obtain dedicated or reserved capacity as discussed elsewhereherein. If the request is determined to be a dedicated capacity instancerequest 506, the instance request can be fulfilled (e.g., theappropriate instance(s) can be launched and maintained) using dedicatedcapacity storage 508 as discussed herein.

When a instance request is to be fulfilled using dedicated capacity, thePES Manager can first ensure that the corresponding dedicated capacityis not already in use or scheduled for use for other purposes. Ifsufficient dedicated capacity is not available, an error message orother similar response can be provided. The PES Manager can also ensurethat the instance request was received within an appropriate use period,and/or may otherwise authorize the instance request (e.g., authenticatethe request, authorize the subscriber user, etc.). The PES Manager candetermine whether the allocated dedicated capacity to be used for theinstance request is currently in use as part of fulfilling a priorrequest from the requester as part of a private excess capacity pool forthe requester. If not, the service can fulfill the instance requestusing the dedicated capacity. In some embodiments, the “dedicated”requests can correspond to reserved capacity for the user, such that theuser is able to utilize the reserved capacity without submitting a bidprice as discussed elsewhere herein. In other embodiments as discussedelsewhere herein, reserved capacity is treated as a separate type ofcapacity, with separate determination rules, etc.

If the instance request is not a dedicated type, the determination mightbe made that the request is an excess capacity type of request 510. Ifso, the bid price for the request is determined 512. As discussed laterherein, the bid price might depend on a number of different factors,such as various capacity levels provided by the currently availableresources. For example, the user might submit a bid price for a rate ofI/O operations that can be provided by the currently available capacity,but might submit a higher bid if the capacity also has a bandwidth valueover a certain threshold. Various other options exist as discussed laterherein. Once the bid price is determined, a determination is made as towhether that bid at least meets the current market price for capacitywith the currently available attributes 514. If the bid price at leastmeets the current market value, it is also determined whether the bidprice exceeds the bid price of other bidding users (or if the requestotherwise has priority over the other pending requests) and whether theavailable capacity meets all the requirements for the bid 516, such aswhere the bid for the request requires certain capacity levels, such asa maximum average latency, without which a bid should not be accepted.If an acceptable type of capacity is available for the request and thebid price is acceptable, the bid for the request is accepted 518 and theinstance request is fulfilled using excess capacity 520.

If the excess capacity corresponds to a private excess capacity pool,that private pool can be selected to receive the indicated excesscapacity, and otherwise a general excess capacity pool can be selected.In some embodiments, multiple pools may be provided, as previouslydiscussed with respect to private and general excess capacity pools,and/or in other manners (e.g., various different types of capacity maybe available and grouped into corresponding private and/or general poolsassociated with the type of capacity). When an instance request isreceived to be processed using some amount of available excess capacityon behalf of an excess capacity, a corresponding private and/or generalexcess capacity pool is selected from which the excess capacity is to beobtained to process the request. In some embodiments, additionalinformation for the request may include configuration information, suchas indications of an amount and/or type of capacity requested (e.g.,including minimum and maximum amounts), a bid amount (e.g., includingminimum and maximum bid amounts), an expiration time, a particularexcess capacity pool to use (e.g., only a particular private excesscapacity pool associated with the user, or to use a particular privateexcess capacity pool if available but to otherwise use a general excesscapacity pool), etc. In some embodiments, some or all such additionalinformation may instead be included as part of a separate configurationand/or registration process performed by, or on behalf of, the excesscapacity user. In some embodiments, feedback may be provided to anexcess capacity user of one or more types, such as may indicate likelycharacteristics of the requested processing (e.g., a likely time thatthe request will be processed, a likely duration of processing, a likelyexcess capacity pool to be used, etc.) and/or optional other information(e.g., suggested configurations). The instance request can be added to agroup of current instance requests for fulfillment on excess capacityfor the selected excess capacity pool to be used. In other embodiments,rather than add the instance request to a group of other requests, theservice may instead attempt to immediately satisfy the instance request,such as by determining whether there is available excess capacity tolaunch an instance at that immediate time (e.g., in a particular privateexcess capacity pool). As discussed, the excess capacity can be part ofa pool of unused dedicated capacity or reserved capacity, such that theprocessing might be terminated at any time when a user with a higherpriority submits a request to be processed using that capacity.

If the instance request is not a dedicated or excess type request, thecapacity is not currently available for one of those types of requests,the bid price for the request is below market price, or for any of anumber of other reasons the request is not able to be processed usingdedicated or excess capacity, the instance request can be fulfilledusing on-demand variable capacity. While the determination of requesttype might appear ordered or hierarchical from this example, it shouldbe understood that there can be a single determination, concurrentdeterminations, or any other appropriate determination of result type,etc. For any such instance request, a determination is made as towhether there is any variable capacity available to handle the request522. As discussed elsewhere herein, a instance request processed usingvariable capacity may receive no guarantees for the respectiveinstance(s). If capacity is available, and if any minimum criteria forthe request are satisfied, the instance requests can be fulfilled usingthe variable capacity 526.

When an instance request is to be processed using on-demand, variablecapacity usage, it is determined whether the request is to be processedusing currently available capacity (e.g., a current request forimmediate processing, a previously scheduled request, etc.). Asdiscussed elsewhere herein, such a request may specify various typesand/or amounts of capacity with which to execute one or more programs onbehalf of a variable capacity user. If currently available capacity isto be used, the instance request is fulfilled using the availablevariable capacity. If such capacity is not available, the user or othersource of the request can be queried to determine whether to move theinstance request to a queue for use with excess resource capacity, whilesuch a move can be performed automatically in at least some embodimentsand situations. In addition, in some embodiments one or more instancesof excess capacity users can be terminated, in response to a request tolaunch instances on variable program execution capacity, in order tofree variable program execution capacity. If capacity is not availableand the request is not able to be moved for use with excess capacity,the request for variable capacity can be denied 524, and an appropriateresponse or error message can be sent to the user or other source of therequest. Further detail for these and other steps in such a process canbe found in co-pending U.S. patent application Ser. No. 12/686,273,filed Jan. 12, 2010, entitled “Managing Private Use of Program ExecutionCapacity,” which is hereby incorporate herein by reference.

As previously noted, excess capacity users may be charged various feesin conjunction with use of excess resource capacity, such as may bebased upon whether the excess resource capacity is part of a privateexcess capacity pool for that user, based on a quantity of resourcecapacity used and/or one or more use factors (e.g., number of timesused, amount of shared resources consumed, amount of time capacity isused, etc.), and/or based on one or more bids from the one or moreexcess capacity users for use of the resource capacity. In someembodiments, a portion of the fees charged to the one or more excesscapacity users who use a general excess resource capacity pool may besupplied to one or more other users who provided resource capacity inthat general excess capacity pool (e.g., one or more dedicated capacityusers, one or more other users, etc.). For example, various other usersmay be paid a proportional share of an amount of the fees collected fromexcess capacity users, such as a proportional share reflective of theamount of resource capacity contributed by the other users over time. Insome cases, such fees supplied to the other users may be automaticallycredited and/or paid to the other users by the PES provider, such as tooffset other charges incurred by those other users, such as chargesincurred by dedicated capacity users.

In some embodiments, the PES Manager may provide an electronicmarketplace (not shown) to one or more dedicated capacity users, suchthat the one or more dedicated capacity users may transfer access totheir dedicated resource capacity to one or more other users during theuse time period of the dedicated capacity, while in other embodiments adedicated capacity user and another user may arrange an exchange in amanner external to the PES. In some embodiments, a user may providepayment to a dedicated capacity user in exchange for access to atransferred portion of the dedicated capacity user's dedicated resourcecapacity, such that the purchasing user may access the transferredportions of dedicated capacity to execute programs or process requestson behalf of the purchasing user. A price for the exchanged access maybe determined in various manners in various embodiments, such as via afixed price specified by the dedicated capacity user, a price that isbid or suggested by the user, etc. In some embodiments, an exchange fordedicated resource capacity may be made such that the exchange istemporary and/or permanent. For example, an exchange may be made for aspecified limited period of time and/or various intervals of time, suchthat the purchasing user may access the resource capacity during thatspecified time and/or during the various intervals, after which thededicated resource capacity may revert back to being dedicated for useby the dedicated capacity user. In some embodiments, the exchange can bepermanent, such that the purchasing user may be provided with access tothe dedicated resource nodes for any remaining use period associatedwith the dedicated resource capacity. In some embodiments, as part ofthe initial allocation of resource capacity for dedicated use by asubscribing dedicated capacity user, the PES Manager may assign one ormore user tokens to the subscribing user and/or may otherwise associatethe allocated capacity with the subscribing dedicated capacity user'saccount, such that the dedicated capacity user's use of the resourcecapacity may be tracked for various purposes (e.g., configuration,authorization, billing, etc.). In such embodiments, when a dedicatedcapacity user transfers a portion of their dedicated resource capacityto a new user, any provided tokens may be transferred to the new userand/or the portion of resource capacity may be otherwise associated withan account of the new user. In some embodiments, a provider of the PESManager may further collect a fee in conjunction with a transfer of aportion of dedicated resource capacity from one user to another.

Although the foregoing example is described with respect to a PES thatprovides various types of functionality for various types of users, suchas variable capacity users, dedicated capacity users, and excesscapacity users, it will be appreciated that various other embodimentsmay exist, such as embodiments with or without one or more of thevarious types of users. For example, it will be appreciated that someembodiments may exist where a program execution service providesfunctionality for excess capacity users, but does not include variableand/or dedicated capacity users, such as where the excess programexecution capacity is provided by one or more third-parties and/oraffiliate entities associated with the PES, such as to allow suchparties and/or entities to monetize otherwise unused resources. Inaddition, some of the techniques may be used in conjunction with abid-based PES that allows users to submit requests for and/or toschedule execution of programs or processing of requests on a temporarybasis on all of the resource capacity provided by the service, such thatrequests with higher priority at the time of execution are executed. Inaddition, it will be appreciated that the various types of user may eachact as one or more of the other various types of user at times. As anexample, a particular user who acts as a dedicated capacity user toprocess requests on dedicated resource capacity may also act as anon-demand variable capacity user, such as when the particular userdesires additional resource capacity to process one or more requests forthe respective user.

Further, in at least some embodiments unused resource capacity (e.g.,unused portions of memory, unused bandwidth or throughput, etc.) may bemade available for use by one or more excess capacity users, such thatone or more instance requests of the one or more excess capacity userscan share a resource with a dedicated capacity user and/or other excesscapacity users. In some embodiments, at least some of the resourcecapacity that is allocated for use by dedicated capacity users may bemade available for use by one or more variable capacity users, such asif it is determined that such access is unlikely to impact dedicatedcapacity users (e.g., in cases where accurate predictions of upcomingdemand can be forecast, etc.). Furthermore, if some amount of resourcecapacity dedicated for use by one or more dedicated capacity users isoversubscribed (e.g., oversold, and/or provided to one or more otherlong term users), such that the oversubscribed capacity is unavailablefor the one or more dedicated capacity users at a time that those usersrequest use, then one or more of the requests being processed using theoversubscribed resources may be migrated to one or more other resourceinstances, such as may be available in one or more remote data centersand/or other computing systems.

It should be understood that even though examples discussed herein referto a program execution service and resource capacity, the describedtechniques can be used to manage access to various types ofcomputing-related resources discussed herein, and can process requestsnot related to a user-specific program or application. A non-exclusivelist of examples of types of computing-related resources and resourcecapacity that may be managed for use by multiple users includes thefollowing: persistent data storage capabilities (e.g., on non-volatilememory devices, such as hard disk drives); temporary data storagecapabilities (e.g., on volatile memory, such as RAM); message queuingand/or parsing capabilities; other types of communication capabilities(e.g., network sockets, virtual communication circuits, etc.); databasemanagement capabilities; dedicated bandwidth or other network-relatedresources; guaranteed rates of IOPS; maximum latency guarantees; inputdevice capabilities; output device capabilities; processor (e.g., CPU)cycles or other instruction execution capabilities; etc. In one example,a user may request one or more indicated types of computing-relatedresource capacity, and the PES system can automatically determine anamount of each indicated type of resource capacity (e.g., based on anexplicit quantity or other amount indicated by the user in the request,based on predetermined amounts associated with particular resourcetypes, based on available amounts of the indicated resource types, etc.)to provide for the user, such as a first amount of volatile memory and asecond amount of minimum bandwidth.

FIGS. 6( a) and 6(b) illustrate an example approach to managing excessresource capacity, such as may be automatically performed by a PESManager in at least one embodiment. In this example, the resourcecapacity will be described with respect to a plurality of resource nodesoperable to fulfill instance requests and I/O operations, or performother such tasks, with certain levels of throughput, bandwidth, andother such functional aspects. It should be understood, however, thatany other appropriate resource can be managed using such an approach inaccordance with various embodiments.

FIG. 6( a) illustrates a situation wherein instance requests from fourusers (A, B, C, and D) are received, where the system manages thoserequests using excess resource capacity from a general excess resourcecapacity pool. In this example, none of the users submitting requestshave an associated private excess resource capacity pool. A first table600 of information indicates usage of each of a plurality of resourcenodes with respect to time, with the resource nodes including bothdedicated nodes 602, 604, 606, 608 and non-dedicated nodes 610. Asdiscussed, the usage of each node can be managed by a PES or otherappropriate system or service for each consecutive block of time(t1-t12). Further, a second table 620 of information indicatesinformation about instance requests for users A, B, C and D received bythe program execution service to be processed using excess resourcenodes of a general excess resource capacity pool. As illustrated,information for each instance request can include the time that therequest was received, the maximum and/or minimum number of nodesrequired to fulfill the request, the bid amount, and an expiration timefor the request. As should be understood, any appropriate alternative oradditional information can be used as well.

Information contained in the second table 620 can be used by the PESManager to determine when and how to process each instance request basedat least in part upon the usage indicated in the first table 600. In thefirst table, blocks of time having a fill pattern of horizontal linesare in use for other instance requests, and thus not available as excesscapacity. A block of time without any patterning indicates a respectiveresource node being available for use as excess program executioncapacity during that interval of time. A block of time containing aletter indicates that a request or program is being processed orexecuted on the respective resource node during that period of time,where the resource node during that time offered excess capacity. Thelength of time of each block can be any appropriate period of time, suchas ten minutes, an hour, a day, or any other appropriate period.

As illustrated, instance request A was received at a time thatapproximately corresponds to time block t2 (e.g., just before or duringthe corresponding time interval), indicating a preference to execute ona single excess capacity resource node, with a bid price of $0.05 perhour of use of the single excess resource node and no specifiedexpiration (e.g., indicating that the request is to continuously executeand/or re-execute until execution or processing is completed). In thisexample, each fulfillment may provide approximately the same amount ofresource capacity (throughput, bandwidth, latency, etc.) per timeinterval, while in other embodiments the capacity of the various nodescan vary with respect to at least one functional aspect (e.g., storagecapacity or maximum rate of I/O operations) such that a request mightalso specify at least one minimum or preferred aspect of a node to beused in processing the request. In other embodiments, the variousrequests may be configured in other ways, such as to include one or moreof a specified particular type of resource node to use (e.g., and/orcharacteristics of such resource nodes), a minimum and/or maximum bidamount, and/or one or more other configurations (e.g., fault tolerancerequirements, execution locality and/or proximity preferences, etc.). Inaddition, other types of information may be indicated in someembodiments, such as one or more particular programs to be executed foreach request, a total amount of aggregate resource node time intervalsfor the request, etc.

The first table 600 includes a number of dedicated capacity resourcenodes 602, 604, 606, 608, which may include resource nodes that havebeen allocated for dedicated access to one or more specific dedicatedcapacity users. The table also includes one or more non-dedicatedresource nodes 610, which may be available for other types of resourcecapacity (e.g., on-demand variable capacity). In one example, adedicated capacity user (not shown) may have priority access to aspecific resource node 602 for a specified period of time (e.g., ayear), such that the user may access the dedicated node 602 to launchinstances and fulfill I/O operations on behalf of the user at any timeduring the specified period of time, although such access may be subjectto a delay period and/or one or more interactions on the part of theuser to gain access (e.g., notifications of an intent to use theresource node 602 at a particular time, a request to execute programs onthe resource node 602, etc.). In other embodiments, the dedicatedcapacity user (not shown) may instead have priority access to a resourcenode with equivalent computing resources as the dedicated resource node602 (e.g., equivalent processing capacity, memory, bandwidth, etc.), butnot have a particular allocated resource node, such that the user may beprovided access to any of the resource nodes that are equivalent to thededicated node and that are available for use. In various embodiments,the PES Manager can ensure that a sufficient number of equivalentdedicated resource nodes is available for use by dedicated users who mayhave priority access to such nodes in various ways (e.g., maintaining agroup of such resource nodes and/or otherwise reserving a specificamount of such nodes, etc.).

During time intervals t1-t2, dedicated node 606 is determined to includeexcess capacity (in at least one functional aspect), such as may bebased on being unused by a dedicated capacity user to whom the resourcenode is allocated. During this period, the node 606 can be madeavailable for use by excess capacity users. In some embodiments, thededicated capacity can indicate to the program execution service thatthe resource node is available for excess capacity, such as at some timeprior to time interval t1. In some embodiments, the PES Manager cabautomatically determine that at least one aspect of the resource node602 is not being used. In the illustrated example, all the otherresource nodes 604, 606, 608, 610 are not determined to be availableduring that time interval. When instance request A is received aroundtime interval t2, the PES Manager determines to process the requestusing the excess capacity available on resource node 606. At the time,there are no other pending instance requests from excess capacity users,so there is no other bid to compare and the user-specified fee of$0.05/hour is accepted for processing of request A on node 606. In someembodiments, the program execution service may utilize a fixed price (orother designated) fee when there are no competing bids.

At time interval t3, the program execution service determines that theresource node 602 is no longer available to satisfy excess capacityrequests (e.g., based on an indication received from a dedicatedcapacity user reclaiming use of the resource node), whereby theprocessing associated with instance request A is terminated on thatresource node 602. At interval t4, the PES Manager determines that tworesource nodes 602, 604 with sufficient resources to execute instancerequest A are available as excess capacity nodes, and determines toreinitiate processing for request A on dedicated node 602. In someembodiments, node 604 might not be selected if it is indicated that node604 is available, but not preferred, for excess capacity use. In somecases, a dedicated node user might pay extra to always have the nodeavailable without any pending requests, tasks, or applications of otherusers. In other cases, a resource node may not be preferred for variousreasons, such as the node having a short and/or uncertain duration ofavailability (e.g., as determined by the program execution service, suchas may be based on indications from the dedicated capacity user to whomthe node is allocated, based on prior history of use, based onforecasted use, etc.). In some embodiments, the program executionservice may have a preference for selecting a resource node with alonger likely availability for executing a request of an excess capacityuser, such as to minimize having to stop and restart processing on thevarious resource nodes. If another request was received at substantiallythe same time, however, the service could determine to use node 604 toprocess that additional request.

In this example, instance request B is received around interval t5, whenthere is only one excess resource node 602 available. Because there isonly one node available for two instance requests, the service mustdetermine which request to process on that node during the time intervalt5. In this example, the bid amount for request B ($0.08/hour) is higherthan the bid amount for request A ($0.05/hour), such that the programexecution service determines to terminate the processing of request A inlieu of request B. Other reasons for favoring one instance request overanother can be used as well, such as where one instance request isassociated with a higher priority than another request, etc. In thisexample, instance request B is processed continuously on the dedicatednode 602 for a fee of $0.08/hour over the time intervals t5-t6. Further,at time interval t6 there are two resource nodes are available as excessresource nodes, each having sufficient resources for processing requestB. Since instance request B specifies a maximum of two resource nodes,and has a higher bid amount that request A, request B can continue to beprocessed using node 602, and also be processed using node 608, withrequest A remaining terminated for the time being.

At time interval t7, three dedicated resource nodes 602, 606, 608 aredetermined to be available as excess capacity resource nodes, andinstance request C is received. In this embodiment, fulfillment ofinstance request B is terminated on dedicated resource nodes 602 and608, and portions of instance request C are fulfilled on all three ofthe available excess resource nodes based on request C having anindicated preference to execute on a maximum of three nodes and having ahigher bid amount (e.g., $0.10/hour) than requests A and B, thusproviding a higher priority for instance request C than for requests Aand B. At time interval t8, one of the dedicated nodes 608 is determinedto no longer be available as an excess resource nodes, with the nodehaving been reclaimed or otherwise having become unavailable. Theportion of request C being processed on that node are terminated, butthe portions on nodes 602 and 606 continue processing. Node 602similarly becomes unavailable at t9, with request C only being processedby node 606. In this example, request C specified termination afterthree hours, such that processing of request C is terminated after timet9. Since processing of request B has not been completed and B has ahigher bid price than request A, the processing of request B isreinitiated on the single available resource node 606.

During time interval t10, the processing of instance request B ends(e.g., based on the associated program(s) completing their executionafter five aggregate hours of execution, or instead based on aninstruction received (not shown) to cancel request B from the excesscapacity user who submitted request B), and instance request B istreated as no longer being a pending request to be satisfied. Inaddition, at or near this time, instance request D is received withhaving a bid amount equivalent to previously received instance request A($0.05/hour). In this case, assuming no other priority information, thePES Manager can determine to reinitiate fulfillment for request A onavailable dedicated resource node 606 at next time interval t11 ratherthan for instance request D, based at least in part upon request Ahaving been received at an earlier time than D and/or already having atleast a portion of the processing completed. Request A can continue tobe processed on node 606 until some point in the future when theprocessing is completed or one of the other situations discussed hereinoccurs.

At interval t12 one of the other resource nodes 610 becomes availablefor use as excess resource capacity. The node might be a non-preferredexcess capacity resource node, but request D is nonetheless processedusing that node 610 since no other excess resource nodes are availablefor interval t12.

At least one component of the program execution service can beconfigured to track usage of the resource nodes for each user, such thateach user is charged an amount of fees commensurate with the bid amountsand periods of usage. In addition, the program execution service mayalso track which of the resource nodes were used and/or were madeavailable for use by excess capacity users, such that one or morededicated users associated with those resource nodes may be given someportion of the fees collected from the excess capacity users.

FIG. 6( b) illustrates a similar situation, but where one of the users(here user B) has a private excess resource capacity pool. Theinformation displayed in the tables 640, 660 reflect the changes due tothe use of the private excess resource capacity pool. In this example,user B is a dedicated capacity user, and has been allocated thededicated use of a resource node 602 for a time period that includestime intervals t1-t12. The fill pattern for node 602 has been adjustedin this FIG. 6( b) to indicate that any unused capacity of this resourcenode 602 is available for use as a private excess capacity pool for userB. In this example, the time intervals of t1-t3 and t9-t12 for resourcenode 602 correspond to dedicated use of the node by user B, and theresource node 602 is available during the time intervals of t4-t8 foruse as part of the private excess resource capacity pool for user B. Asdiscussed in greater detail elsewhere, requests from user B havepriority for use of the private excess resource capacity pool.

The assignments for time intervals t1-t6 are the same in FIG. 6( b) asin FIG. 6( a). For example, instance request B was already assigned touse resource node 602 for time intervals t5-t6, based on request Bhaving a higher priority for the general excess resource capacity poolin than request A. However, after instance request C is received fortime interval t7, the assignments change in FIG. 6( b) relative to FIG.6( a) based on the use of the private excess resource capacity pool foruser B. In particular, in FIG. 6( a) instance request C was given higherpriority than instance requests A and B for the general excess resourcecapacity pool, and thus all three excess resource capacity nodesavailable at time interval t7 in FIG. 6( a) began to process portions ofrequest C. With respect to FIG. 6( b), however, instance request Bcontinues to have the highest priority at time interval t7 for theexcess resource capacity in user B's private excess resource capacitypool. Accordingly, the program(s) for request C begin to execute on theother excess resource nodes 606 and 608 at time interval t7 in FIG. 6(b), but the program(s) for request B continue to execute on resourcenode 602 at that time interval in FIG. 6( b) in a manner different fromthat of FIG. 6( a). In particular, since an instance request from user Bis available at time interval t7, that request (in this example, requestB) is given priority to use the excess resource capacity of resourcenode 602 that is part of user B's private excess resource capacity pool.Similarly, at subsequent time interval t8, if the program(s) for requestB had continued to execute, those program(s) would have continued toexecute on resource node 602 for the same reasons. However, in thisexample request B ends after five aggregate hours of processing, suchthat the excess resource capacity for resource node 602 at time intervalt8 returns to the general excess resource capacity pool, and theprogram(s) of request C begin to execute on the resource node 602 fortime interval t8.

The use of such a private excess resource capacity pool can provide auser with various benefits. For example, a request from that user can becompleted more rapidly using the dedicated pool, as is evidenced byinstance request B being completed at interval t7 in FIG. 6( b) andinterval t10 in FIG. 6( a). In some embodiments, request B can beperformed more cheaply for user B in the second situation, as theprivate excess resource capacity pool for user B is charged to user B atthe same rate as the incremental ongoing cost of using the dedicatedresource node 602, which in this example is $0.04 per time interval hourfor the dedicated usage. The performance of instance request B in FIG.6( a) would have cost the bid price for request B of $0.08 per timeinterval hour (i.e., twice that amount for each time interval hour asthe incremental on-going cost of using dedicated resource node) for eachof the five aggregate hours of processing. The only period for whichuser B did not get the lower dedicated rate was when request B wasprocessed using node 608 during interval t6, where user B was chargedthe bid amount of $0.08 as in FIG. 6( a). Thus, the total cost forperforming request B in FIG. 6( b) is $0.24, while the total cost forperforming request B in FIG. 6( a) is $0.40. While the absolute numbersare small in this example based on the limited amount of use of excessresource capacity, it will be appreciated that increasing such excessresource capacity by a significant amount in a real-world situation mayresult in correspondingly larger actual cost savings (e.g., if use isincreased a thousand-fold, the corresponding savings would be ˜$1600 inthis example, based on actual costs of $2400 rather than $4000).

Furthermore, in other situations instance request B may be fulfilled foreven lower cost than illustrated in the example of FIG. 6( b). Forexample, in FIG. 6( b) one of the five aggregate hours of processing forrequest B (i.e., 20% of the total aggregate hours) was performed usingthe general excess resource capacity pool (i.e., use of the timeinterval of t6 for resource node 608), and an otherwise available hourfrom the private excess resource capacity pool (i.e., time interval t8for dedicated resource node 602) was not used. In some embodiments, userB may specify that request B (and/or any other requests for user B) isonly to be executed using user B's private excess resource capacitypool, rather than to also use the general excess resource capacity poolas a supplement to the private excess resource capacity pool, such thatuser B would be charged the lower fee for each unit time of processing.Increased usage of the private excess resource capacity pool also may betriggered by, for example, by request B specifying a maximum of oneresource node (so that only the private excess resource capacity pool isused if it is available).

In some embodiments, the private excess resource capacity pool can beused in other manners to provide additional benefits. For example,instance request B in FIG. 6( b) might have a higher priority thaninstance request C for use of the general excess resource capacity pool(e.g., if request C has a bid price of $0.07 rather than $0.10).Further, request B might use six aggregate hours to complete processingrather than five, and request B might specify to use a maximum of oneresource node rather than two. In such a situation, and using theallocation scheme previously described for FIG. 6( b), request B wouldnot be selected to use resource node 608 in time interval t6 from thegeneral excess resource capacity (given the maximum of one resource nodeand the preference for using the private excess resource capacity pool),but would be selected to continue to use resource node 602 in timeinterval t8 from the private excess resource capacity pool. However, inthat situation, request B would still have one additional hour ofprocessing to complete at the end of time interval t6, but theavailability of resource node 602 in the private excess resourcecapacity pool at time interval t9 would disappear based on the resumeduse of dedicated capacity by user B at that time interval.

On option in such a situation would be to terminate the instance(s) forrequest B on resource node 602 at the end of time interval t8, and toimmediately reinitialize the instances for one additional hour onresource node 606 during time interval t9. In some embodiments, in orderto avoid the overhead of terminating and then restarting the instancesfor request B when only a short time remains until completion,processing for request B could instead be allowed to complete onresource node 602 during some or all of time interval t9. While user B'sdesire to resume dedicated capacity use in time interval t9 could bedeferred in this situation, an alternative that accommodates resumptionof dedicated capacity use in time interval t9 includes selecting anotherresource node to temporarily use for user B's dedicated capacity useduring at least time interval t9, such as resource node 606. In thismanner, user B receives the desired dedicated capacity use in timeinterval t9, and the processing of request B is allowed to complete moreefficiently and quickly. Nonetheless, the use of resource node 602during time interval t9 for the completion of the processing for requestB may not be treated (for cost purposes) as being part of the privateexcess resource capacity pool, such that user B may receive thededicated capacity use price of $0.04 for the use of resource node 606during time interval t9, but the execution of the program(s) for requestB using resource node 602 during time interval t9 may be charged at thegeneral excess resource capacity pool price of $0.08 to reflect requestB's bid price. It will be appreciated that other alternatives maysimilarly be used in other embodiments and situations.

In addition, node usage and allocation may differ in other embodimentswhere the nodes do not have equivalent resource capacity (e.g.,bandwidth, IOPS, latency, compute, etc.) and/or characteristics(platform specification, etc.). In some such embodiments, variousrequests can include indications of one or more specific types ofresource node for use in fulfilling those instance requests, and thoserequests may only be fulfilled using the corresponding specified type ofresource node. Further, rather than excess capacity being based onunused dedicated resource nodes and other resource nodes as illustrated,embodiments may exist where only one group of resource nodes and/oradditional groups of resource nodes may contribute to excess capacityresource nodes available for executing requests of excess capacityusers. Furthermore, in some embodiments, at least some of the resourcenodes may include resource nodes provided to the program executionservice by one or more third parties.

In some embodiments, users might submit multiple bids that are basedupon multiple types and/or combinations of resource capacity. Forexample, a user might be willing to bid $0.04/hr for a node of computecapacity if that node can provide at least 100 IOPS, but might bewilling to bid $0.06/hr for a node of compute capacity if that node canprovide at least 200 TOPS. In another example, the user might bid$0.06/hr for 200 TOPS, and might not care how many nodes need to be usedto provide that rate of IOPS. There can be various other criteria oroptions that a user might use to bid for resource capacity.

For example, consider the examples illustrated in FIGS. 7( a) and 7(b).For simplicity these examples do not include information such as numberof nodes and expiration time, but it should be understood that suchinformation can be utilized as well using approaches discussed elsewhereherein. FIG. 7( a) illustrates a first example 700 indicating how a usermight submit bids based on multiple capacities or functional aspects ofvarious shared resources. In this example, a user is able to provide bidamounts for two different levels of service for each of four differentcapacity areas, although different numbers of bids and selections ofcapacities can be used in other embodiments. In this example, the userhas submitted bids that are higher for compute capacity B (e.g., aserver with a greater number of processors) than for compute capacity A.As illustrated, the user also is willing to bid more, for mostcombinations, for TOPS rate B than IOPS rate A. The user on average isnot particularly worried about latency, such that the user is notwilling to bid more for a resource that has a shorter amount of maximumlatency. It also can be seen that the user is not willing to bidanything for resources with bandwidth rate A, and is only willing tosubmit bids for bandwidth rate B, such as where a user applicationrequires a minimum bandwidth greater than bandwidth rate A.

Such information can be used to generate bid amounts for a user instancerequest based on one or more aspects of an available resource. Forexample, consider a resource node becoming available as excess capacitythat has compute capacity A, IOPS rate A, bandwidth rate B, and maximumlatency B. Using a set of bids such as that illustrated in FIG. 7( a),an appropriate bid amount can be determined using any of a number ofdifferent algorithms. For example, in one embodiment the algorithm canselect the highest bid amount for the available resource. For example,the set of bids indicates that the user is willing to bid $0.06 forresource capacity when that resource has both maximum latency B and IOPSrate A. In one embodiment, the system would select a bid of $0.06 forthe user. In another embodiment, the system might look at the minimumbid for the resource, as the other combinations for this resource havean associated bid price of $0.04, such that a bid price of $0.04 mightbe selected. Other embodiments might take an average, weighted average,or other combination to produce a value that might be rounded off to thenearest cent (or other appropriate value). In this case where the valuesrange from $0.04 to $0.06 for the combinations of resource types, thefinal bid value might be $0.05 after computation.

FIG. 7( b) illustrates a set of bid values 750 that can be used inaccordance with another embodiment. In this example, there can be adefault compute capacity (e.g., a standard server or compute deviceoffered by the service) and a user can submit a default bid price ($0.04in this example) to be used for the default type of resource. The usercan also specify bid adjustment values to be used when resources withcertain capacity values or types become available. For example, if aresource becomes available with increased compute capacity B, the usermight be willing to increase the bid amount by $0.02/hr. The user mightnot be willing to adjust the default bid price based on TOPS, such anadjustment value of zero (or another such value) is entered. In thisexample, the user will not bid for bandwidth below bandwidth rate B, andthus has entered a “no bid” value for bandwidth A such that no bid willbe used if a resource becomes available without at least bandwidth valueB. Another approach that can be used as opposed to a bid increase valueis to use a bid decrease value. In this example, the user prefers not touse a resource value with maximum latency value A, such that the userhas indicated a negative bid adjustment of $0.01, such that if a defaultresource becomes available with only maximum latency A, the bid amountcan be calculated to be $0.03. Various other such approaches can be usedas well as should be apparent to one of ordinary skill in the art inlight of the teachings and suggestions contained herein.

It should be understood, however, that a system might not always selectthe highest bid for an available resource. In some embodiments, theremight be a pool of users requesting excess capacity where at least someof those users have a different set of bid prices. As discussed above,some users might be given priority based on a type of user, a type ofaccess requested, and whether the user has at least a portion of arequest already processed. For example, if a user has a request almostcompleted with a bid price of $0.04, and there is another user with abid price of $0.05 but that user's request has not yet startedprocessing, the system might be configured to attempt to complete thefirst request first, even though the bid price is lower. Such anapproach can attempt to optimize on aspects such as throughput orlatency as opposed to price.

In some embodiments, a PES Manager might look to the type of resourcesavailable. For example, if there is a limited number of high processingcapacity devices, the device might attempt to process requests with bidadjustments for high processing capacity devices even though there mightbe other requests pending with higher bid amounts. For example, considerrequest A with a default bid of $0.04 and a bid adjustment of $0.02 forhigher compute capacity resources. Also, consider request B with a bidof $0.08 for any type of compute capacity. If a node becomes availablewith a high compute capacity, the PES Manager might decide to processrequest A instead of request B, as the system will make more money byprocessing request A with the higher capacity resource and processingrequest B with the next available resource (which will not affectrequest B's bid price). Various other such examples can be imagined inlight of the present disclosure, such as where users are given prioritybased on bandwidth, latency, or other such aspects as opposed to, or incombination with, bid price.

In one example, a user might submit a rights request to the PES servicefor a level of reserved committed IOPS, wherein the user requests theability to create volumes (at a reduced price) that have 20,000committed IOPS over the next three years. For example, the user mightwant to reserve a level of IOPS in case the user has to perform disasterrecovery or another such process. Such a user might reserve capacity intwo separate geographical areas in case of a data center failure orother such even, such that the user launch instances in anothergeographical area if one area becomes unavailable, but might only useoperate in one of those geographical areas during normal operations. Theuser could alternatively request volumes with a total committed IOPS of20,000 over the next three years, and could pay more for the dedicatedvolumes than for the dedicated ability (or reservation) to create thosevolumes over the same period, whether the user actually uses thecapacity or not. A user with such a reservation then can be guaranteedto be able to create a volume with up to 20,000 IOPS when the useractually attempts to create the volume, and can be charged a slightlylarger amount than would be charged for on-demand committed IOPS. A userwith reserved capacity in at least some systems is not charged when theuser does not have active reserved committed TOPS volumes during thereservation period, such that the user is incentivized to destroyvolumes when those volumes are not being used, which can free up excesscapacity for other users or at least reduce the number of devices neededto provide the necessary capacity for all users.

A user with dedicated and/or reserved capacity might not be using all ofthat capacity at all times, such that other users can potentially beable to utilize at least a portion of that unused capacity. For example,if the dedicated user with 20,000 reserved IOPS is only using 10,000TOPS, then another user wanting a volume with 100 committed IOPS can, inat least some embodiments, utilize the unused capacity (the “remnant”capacity) from the dedicated user. Further, the other user can submit abid per month (or other appropriate period as discussed elsewhereherein) to utilize that unused capacity when available. The rate chargedfor usage of remnant capacity can be less than would be charged fordedicated or other types of capacity, as a remnant user might haveprocessing terminated, paused, moved, or otherwise interrupted if thededicated user for that capacity begins or resumes using that capacity.The dedicated user can set minimum bids for usage of the remnantcapacity in some examples, or can use a dynamic bidding process in orderto charge whatever the market will yield at a particular time. In someembodiments, a bidder can indicate a maximum price, and if that bid isabove at or above a currently determined market price and there isavailable capacity, the bid can be accepted as discussed above.

As discussed above, an excess capacity user can submit multiple bidsbased on other aspects of the resource capacity, such as a base bid of$0.04/hr for the 100 IOPS and a bid of $0.06 if the bandwidth is alsoabove 100 mbps. The base bid also can have minimum criteria for theother capacity values, such that the excess capacity user will notprovide a bid if the resource cannot provide at least 50 mbps. In someembodiments, the system can provide various bid “packages” wherein auser can provide bids for fixed combinations of capacity values, such ascompute capacity, storage capacity, IOPS, bandwidth, latency, and/orother such aspects. A user might accept a lease for a certain amount ofthroughput, and there also can be various levels or “tiers” of servicethat people bid against. In some systems, a user can request a minimumcapacity (such as 100 IOPS) and bid for improved capacity, such as up to1000 IOPS. The user in some embodiments could bid for IOPS (or othercapacity types) in increments, such as increments of 100 IOPS. Variousrules and policies can be used to govern the bidding, acceptance, andusage of the capacity, such as to optimize for resource usage or overallrevenue as discussed elsewhere herein. For example, the system couldadjust the market price downward such that more bids are accepted, inorder to increase resource usage until the system reaches a thresholdlevel of usage or other such target. In some cases a higher bid willalways be accepted before a lower bid, while in other cases a requestwith an overall higher profit will be accepted first or requests will bereceived to optimize throughput, etc. Users can be provided withhistorical data to help with setting bid prices, such as may be basedupon historical data approaches used for conventional bidding processesknown in the art.

In some embodiments, a user having a bid accepted for excess capacitycan receive a guarantee that the user will be allowed to use thatcapacity for at least a minimum period of time, such as 15 minutes, anhour, etc., whereby a dedicated user for that resource cannot reclaimthat capacity until at least that guaranteed period of time has passed.In some embodiments, the market price can be adjusted at each suchperiod, such as every 15 minutes, and the user's bid can be reevaluatedsuch that if the bid price is no longer at market value, the use of thatresource by the excess capacity user can be terminated. If a user with acompute instance has use terminated, for example, that instance can beturned off, while users with data volumes can have the volumes destroyedupon termination. In some cases, a snapshot of the data volume will betaken before the data volume is destroyed, whereby the volume can berecreated at a later time or the data can otherwise be recovered. If theuser has a level of throughput, bandwidth, or latency terminated, thatuser could be downgraded to a lower level of service, such as anuncommitted level of service instead of a committed level of service,such as where a user would get a rate of TOPS or bandwidth based uponthe resources available at that time.

In some embodiments, a user can potentially pay for “bursts” of resourceusage. A burst as used herein refers to a temporarily increased amountof resource usage, where a user goes over the allocated amount ofcapacity for up to a specified period of time. In this case, the user isessentially reserving capacity, but might be able to pay less for theadditional capacity than for reserved capacity when the user agrees thatthe usage will be for at most a specified period of time, such as 15seconds, one minute, etc. In such an instance, processing for an excesscapacity user or other such user might be temporarily suspended to allowfor the burst of usage. Various other types of bidding arrangement canbe utilized as well, such as where a user purchases a committed overallamount of capacity, but applies that capacity commitment across multipleresources. Bidding for additional capacity can also be dynamic, such aswhere the user is willing to purchase dedicated capacity when the marketprice drops below a specified level.

In some embodiments, a user going over the guaranteed or dedicatedcapacity might be able to obtain additional capacity, but might have topay the current market price for uncommitted request processing. A userwith 100 guaranteed IOPS then would have to pay market price for the101^(st) I/O operation. Users then can exceed their guarantees whennecessary, without having to provide a relatively large set of bidamounts to cover various situations. As long as the capacity isavailable, the user can be allowed to utilize the excess capacity. Insome cases, users might be capped to a certain level of usage. If a userdoes not want to pay for a lot of, or any, excess usage, the user mightput a limit on the amount of resource capacity that can be provided tothe user. For example, the user might indicate that requests should onlybe processed up to the guaranteed amount, such as up to only 100 IOPS.In other embodiments, a user might set a threshold amount or prices,such as where the user will set a maximum cap of 110 TOPS or a maximumexcess charge of $0.50/hr, which enables requests to be processed up toan amount that is based at least in part upon the current market price.In some embodiments, a user can request to be notified if excess usageis detected, in order to evaluate aspects such as whether additionalcapacity should be purchased or whether the user application is notrunning as expected.

Users can also, in at least some embodiments, adjust their bid prices asoften as necessary, as may be based upon the importance of certainrequests, current applications being executed, etc. In such a spotmarket, a user can increase a bid amount to ensure that the user getspriority to extra capacity (e.g., extra IOPS or bandwidth) when thatcapacity becomes available. The user can also monitor the current marketprice, and can adjust bids dynamically to ensure that the bids submittedat least meet market price. A user also can have the option to specify,for each request where the user does not have guaranteed capacity or isover that capacity, whether to purchase generally available resourcesthat are not guaranteed (e.g., on-demand variable capacity), or excesscapacity from dedicated users that can be guaranteed for at least aperiod of time.

In another example of resource capacity usage, a user might be doingsequential file access and thus might also be interested in thebandwidth for the number of input/output (I/O) operations. Thus, theuser might be willing to only bid for a minimum level of megabytes persecond (mbps), gigabytes per second, or other such rate. As opposed torandom I/O, where the main limiter to the amount of data movement isoften the physical head movement speed of the disk, the limiting factoris how quickly data can be streamed from the physical data source, or inparticular for at least some embodiments how much data can be pushedthrough the network interface that connects the virtual computinginstances with the virtual disk drives. Certain applications require alevel of I/O bandwidth coming from their virtual disks in order toachieve the business goals around computational latency. On example ofsuch applications relates to financial markets, where there are only afew “dark” hours for data processing before the market reopens, and alldata simulations must be performed during those few dark hours. Theconventional approach to purchasing additional hardware is not optimal,as the hardware would be largely sitting idle when the simulations arenot being run. Using a system or service such as a program executionservice (PES) enables a user to purchase or reserve excess capacity asneeded. In some cases the user can bid for guaranteed capacity duringonly certain hours, which can be treated either as dedicated or reservedcapacity in different embodiments, while in other embodiments the usercan just purchase a daily guarantee and the PES Manager can perform thescheduling in order to provide lower cost processing to the user. ThePES Manager can also manage other users to further reduce costs, such asto allocate other types of users for a resource, such as uses who aredoing cold storage and do not require committed bandwidth. A specifiedamount of bandwidth can also be provided, for a period of time, inside acluster on a shared resource.

FIGS. 8( a) and 8(b) illustrate example time windows that can be usedfor scheduling periods with specified bandwidth rates in accordance withvarious embodiments. In FIG. 8( a), a user requests a volume with a 200GB capacity in a specific geographical region, with 100 mbps ofbandwidth between 11 p.m. and 12 a.m. every day. This corresponds to afixed window 802, where the level of bandwidth is provided duringspecified times in which that capacity is dedicated to that user. Inanother example, the user could request 100 mbps of bandwidth for aperiod of sixty minutes each day, and may not care when that sixtyminute period is scheduled. This can correspond to a sliding window 804,which has a specified duration (here sixty minutes) but that sixtyminutes can be provided at any time throughput the day, as may bedetermined by a PES Manager or other module or component. Thus, for aresource capacity such as bandwidth, there can be at least oneadditional parameter that specifies one or more aspects of a timewindowing approach to be used for the processing. For example, a usermight have the 200 GB capacity 24 hours a day, but might only obtain alevel of at least 100 mbps during the specified window of time for whichthe user is willing to pay for the guarantee. Outside that time window,the user can get a different rate, such as might be available for theresource at that point in time. In some embodiments, a system mightprovide a minimum guarantee for sequential access, such as at least 10mbps, while in other embodiments a user without a guarantee might haveno ability to rely upon a minimum bandwidth (although the system ingeneral will typically want to avoid bottlenecks and lack of bandwidthin order to avoid losing customers). In some embodiments, a customermight have a first guarantee to be used as a default, such as at least50 mbps throughput the day, and a second guarantee within a specifiedtime window, such as at least 200 mbps for a fifteen minute period eachday.

Bandwidth capacity thus can be treated differently from capacity such ascompute or IOPS capacity, for example, as a customer may utilize arelatively consistent rate of IOPS over time. For applications such ashigh performance data computing (HPC) or data warehousing, however, thecustomer will typically read a large amount of data at the beginning ofa process, streaming data from disk for a period of twenty to thirtyminutes, for example, and then will not stream data for a period oftime, such as a number of hours, while that data is being processed.Then, near the end of the process, the customer will stream the databack to disk for a period of time, such as ten to twenty minutes. Itthus may not be cost effective for customers to purchase committedbandwidth on a monthly (or other such) basis, as the user might withIOPS, as the customer may only be using that level of bandwidth for asmall portion of the time in specific windows of time. Enabling thecustomer to obtain capacity rates for specific time windows enables thecost to be lowered as the customer does not pay for the capacity over anentire month, and also enables costs to be reduced as multiple userswith different types of workloads can utilize the same resources, andthus can share the costs. A customer thus can get a fixed window of timeeach day, or a sliding time window that can be processed at any time ofday, while other users are being served using that resource. In oneexample, such as is illustrated using the schedule 820 of FIG. 8( b), auser might request a period of twenty minutes of 100 mbps capacitywithin a five hour window 824, where a sliding inner window 822represents the twenty minute period that can be provided anywhere withinthe five hour window 824. A customer might request such an approach whenthe customer wants twenty minutes of high bandwidth capacity, anddoesn't care when those twenty minutes are provided as long as they areprovided during the five hours when the customer business is closed, forexample. In such an example, the customer might pay for 50 mbps averagedper hour for all other times outside that twenty minute sliding window.The PES Manager might then also provide and manage slices of time, ortime slots, along with pools of available bandwidth resources. Pricingalso can be reduced using any of a number of other appropriate factors,such as the length of the commitment, number of commitments, guaranteedminimums, etc.

In addition to bandwidth, rate of I/O operations, and other suchcapacities, a customer might also be willing to pay for a maximum oraverage latency target for requests as discussed above. In some systemsthe latency might not be separately managed, as guaranteed levels ofIOPS and bandwidth can at least partially control the latency that acustomer receives. A guaranteed rate of 1,000 IOPS, however, canpotentially be met by delivering 10,000 IOPS over a period of 10seconds. It may often be the case that a 10 second latency value will beunacceptable to various customers. If a customer wants an averagelatency of 15 milliseconds, or even a maximum latency of 15 millisecondsfor high throughput applications, the system must provide some controls,limits, guidelines, guarantees, or other such aspects in order toprovide acceptable levels of latency even when guaranteed levels ofthroughput are being met. In at least one embodiment, guaranteed levelsof latency can be provided by managing requests such that there are notmore than two outstanding operations on any spindle or other physicalstorage device at any time. Such an approach can potentially reducethroughput, particularly for sequential operations, such that a balancemight be struck between latency and throughput. An example of such abalancing approach is described in co-pending U.S. patent applicationSer. No. 12/749,451, filed Mar. 29, 2010, entitled “Dynamically ChangingQuality of Service Levels,” which is hereby incorporated herein byreference. In other embodiments, the PES Manager can monitor loads onvarious resources and can determine how many operations can be sent to adevice at the current time while still meeting guarantees. Thus, manycustomers might be willing to submit different bids for differentcombinations of IOPS and latency, as discussed above, where a customerwith a guaranteed rate of IOPS is willing to pay extra for a particularlatency guarantee, or is not willing to submit a bid when a minimumlatency cannot be provided, even if the resource is able to provide theguaranteed rate of TOPS.

An approach in accordance with one embodiment is to use flash memory oranother such solid state storage solution for at least part of theguaranteed capacity, which can be provided as part of dedicated,reserved, or excess capacity. A general environment 900 for providingsuch components is illustrated in FIG. 9. It should be understood thatmany additional components can be used to provide functionality asdiscussed and suggested herein, and as would be apparent to one ofordinary skill in the art in light of the teachings and suggestionscontained herein. In this example, a customer 902 subscribes (over anetwork 904) to a program execution service including a PES Manager 906for managing the processing of requests, execution of programs, andother such aspects on behalf of the customer 902. The customer mightrequest a dedicated 1 TB volume to be provided and managed by theservice. The volume could be created using conventional disk-basedstorage 908, storing data across one or more drives or spindles, butlatency for such storage can be limited by the physical constraints ofthe storage mechanism (e.g., seek times, etc.) A customer might want aguaranteed average or maximum latency that is greater than can beprovided with the disk-based storage. The PES Manager could insteadcreate the volume using solid state storage, such as one or more “flash”storage devices, which can provide a much lower average latency as thereare no delays due to head movements or other such mechanicalconstraints. Such an approach, however, can be prohibitively expensivefor certain customers in a conventional environment, as the cost ofstoring a volume of data to a solid state drive (SSD) is currentlysignificantly more expensive than storing the same volume of data toconventional disk-based storage.

An approach in accordance with various embodiments enables a balancingof speed and cost by enabling a portion of the volume for the customerto be stored using one or more solid state drives 908, while storing theremainder of the volume to disk-based storage 910. Further, the latencythat the customer receives can be monitored, and the amount of datastored to the SSD can change over time, as the PES Manager can causevarying amounts of data to be shifted between the SSD and disk storageat different times in order to remain within an allowable range of thelatency target. For example, a customer might have an average latencyguarantee of 15 ms, with a maximum latency guarantee of 20 ms. At apoint in time under a current load, the conventional storage might onlybe able to provide a latency of 18 ms. By way of contrast, a flashdevice might be able to provide a latency of 3 ms. A PES Manager orother such component or algorithm thus can compute how much of thevolume should be moved to flash in order to reduce the average latencyto meet the latency guarantee, while moving the minimum amount of datato flash to minimize cost.

FIG. 10 illustrates an example process 1000 that can be used to providethe guaranteed latency in accordance with at least one embodiment. Inthis example, a PES Manager (or other such module or component)determines the committed or guaranteed latency target 1002 to beprovided to the user as part of a dedicated or excess resource capacityagreement. As requests for the user are processed, the PES Manager canmonitor the actual latency (e.g., average, maximum, etc.) that the userreceives 1004, and can determine whether the actual latency received isabove the latency target 1006. If the actual value is above the target,such that the average latency is greater than the committed value, thePES Manager can determine an amount of data to be moved to flash storageor another SSD 1008, and can cause that amount of data to be moved toflash storage 1010 in order to reduce the latency to near the latencytarget. If, instead, the latency target is determined to be below thelatency target 1012, the PES Manager can instead determine an amount ofdata to be moved to flash storage or another SSD 1014, and can causethat amount of data to be moved from disk storage 1016 in order toreduce the cost of the processing while still remaining within theguaranteed amount of latency for that customer.

In some embodiments, the customer might be charged for the amount offlash used, while in other embodiments the customer will simply pay aflat fee for the latency guarantee, and the system will manage theprocessing such that only the minimum amount of flash is used at anytime, and can compute an appropriate flat fee based on average usage orsome other such information.

Further, depending on load, type of operation, and other suchinformation the amount of data in flash can vary over time. A PESManager can monitor the usage, and can move data into, and out of, flashas appropriate to meet the guarantees but utilize the more expensiveflash storage as little as possible. In some embodiments, a customercould pay for the usage in other ways, such as by paying for a certainpercentage of operations (e.g., 10% or 50%) of the operations to bereturned in under 5 ms, etc., and that percentage could be stored toflash. And that percentage can be monitored over time as well, with databeing moved as needed to stay as close to that percentage as possible.In certain embodiments there is substantially only one operation on aphysical device at any time, but a customer might want a level oflatency that is less than can be provided with conventional storage,such that solid state storage may be preferable for at least a portionof the operations.

In some embodiments, a level of latency provided will be determined foreach customer over a given period to be used for billing the customer,as opposed to charging a flat fee, etc. In some embodiments, thecustomer will provide the desired latency profile ahead of time, and thesystem will have to use monitoring information and prediction algorithmsin order to attempt to meet that latency profile. Any appropriateprediction algorithm can be used, such as a random, read-ahead, or leastrecently used (LRU) algorithm, although a greedy-dual algorithm or otherweighted prediction algorithm can be used as well within the scope ofthe various embodiments.

FIG. 11 illustrates an example of an environment 1100 that can utilizeand/or take advantage of aspects in accordance with various embodiments.As will be appreciated, although a Web-based environment is used forpurposes of explanation, different environments may be used, asappropriate, to implement various embodiments. The environment 1100shown an electronic client device 1102, which can include anyappropriate device operable to send and receive requests, messages, orinformation over an appropriate network 1104 and convey information backto a user of the device. Examples of such client devices includepersonal computers, cell phones, handheld messaging devices, laptopcomputers, set-top boxes, personal data assistants, electronic bookreaders, and the like. The network can include any appropriate network,including an intranet, the Internet, a cellular network, a local areanetwork, or any other such network or combination thereof. Componentsused for such a system can depend at least in part upon the type ofnetwork and/or environment selected. Protocols and components forcommunicating via such a network are well known and will not bediscussed herein in detail. Communication over the network can beenabled by wired or wireless connections, and combinations thereof. Inthis example, the network includes the Internet, as the environmentincludes a Web server 1106 for receiving requests and serving content inresponse thereto, although for other networks an alternative deviceserving a similar purpose could be used as would be apparent to one ofordinary skill in the art.

The illustrative environment includes at least one application server1108 and a data store 1110. It should be understood that there can beseveral application servers, layers, or other elements, processes, orcomponents, which may be chained or otherwise configured, which caninteract to perform tasks such as obtaining data from an appropriatedata store. As used herein the term “data store” refers to any device orcombination of devices capable of storing, accessing, and retrievingdata, which may include any combination and number of data servers,databases, data storage devices, and data storage media, in anystandard, distributed, or clustered environment. The application servercan include any appropriate hardware and software for integrating withthe data store as needed to execute aspects of one or more applicationsfor the client device, handling a majority of the data access andbusiness logic for an application. The application server providesaccess control services in cooperation with the data store, and is ableto generate content such as text, graphics, audio, and/or video to betransferred to the user, which may be served to the user by the Webserver in the form of HTML, XML, or another appropriate structuredlanguage in this example. The handling of all requests and responses, aswell as the delivery of content between the client device 1102 and theapplication server 1108, can be handled by the Web server. It should beunderstood that the Web and application servers are not required and aremerely example components, as structured code discussed herein can beexecuted on any appropriate device or host machine as discussedelsewhere herein.

The data store 1110 can include several separate data tables, databases,or other data storage mechanisms and media for storing data relating toa particular aspect. For example, the data store illustrated includesmechanisms for storing production data 1112 and user information 1116,which can be used to serve content for the production side. The datastore also is shown to include a mechanism for storing log data 1114,which can be used for reporting, analytics, or other appropriatereasons. It should be understood that there can be many other aspectsthat may need to be stored in the data store, such as for page imageinformation and access right information, which can be stored in any ofthe above listed mechanisms as appropriate or in additional mechanismsin the data store 1110. The data store 1110 is operable, through logicassociated therewith, to receive instructions from the applicationserver 1108 or development server 1120, and obtain, update, or otherwiseprocess data in response thereto. In one example, a user might submit asearch request for a certain type of item. In this case, the data storemight access the user information to verify the identity of the user,and can access the catalog detail information to obtain informationabout items of that type. The information then can be returned to theuser, such as in a results listing on a Web page that the user is ableto view via a browser on the user device 1102. Information for aparticular item of interest can be viewed in a dedicated page or windowof the browser.

Each server typically will include an operating system that providesexecutable program instructions for the general administration andoperation of that server, and typically will include a computer-readablemedium storing instructions that, when executed by a processor of theserver, allow the server to perform its intended functions. Suitableimplementations for the operating system and general functionality ofthe servers are known or commercially available, and are readilyimplemented by persons having ordinary skill in the art, particularly inlight of the disclosure herein.

The environment in one embodiment is a distributed computing environmentutilizing several computer systems and components that areinterconnected via communication links, using one or more computernetworks or direct connections. However, it will be appreciated by thoseof ordinary skill in the art that such a system could operate equallywell in a system having fewer or a greater number of components than areillustrated in FIG. 11. Thus, the depiction of the system 1100 in FIG.11 should be taken as being illustrative in nature, and not limiting tothe scope of the disclosure.

An environment such as that illustrated in FIG. 11 can be useful for aprovider such as an electronic marketplace, wherein multiple hosts mightbe used to perform tasks such as serving content, authenticating users,performing payment transactions, or performing any of a number of othersuch tasks. Some of these hosts may be configured to offer the samefunctionality, while other servers might be configured to perform atleast some different functions. The electronic environment in such casesmight include additional components and/or other arrangements, such asthose illustrated in the configuration 200 of FIG. 2, discussed indetail below.

As discussed above, the various embodiments can be implemented in a widevariety of operating environments, which in some cases can include oneor more user computers, computing devices, or processing devices whichcan be used to operate any of a number of applications. User or clientdevices can include any of a number of general purpose personalcomputers, such as desktop or laptop computers running a standardoperating system, as well as cellular, wireless, and handheld devicesrunning mobile software and capable of supporting a number of networkingand messaging protocols. Such a system also can include a number ofworkstations running any of a variety of commercially-availableoperating systems and other known applications for purposes such asdevelopment and database management. These devices also can includeother electronic devices, such as dummy terminals, thin-clients, gamingsystems, and other devices capable of communicating via a network.

Various aspects also can be implemented as part of at least one serviceor Web service, such as may be part of a service-oriented architecture.Services such as Web services can communicate using any appropriate typeof messaging, such as by using messages in extensible markup language(XML) format and exchanged using an appropriate protocol such as SOAP(derived from the “Simple Object Access Protocol”). Processes providedor executed by such services can be written in any appropriate language,such as the Web Services Description Language (WSDL). Using a languagesuch as WSDL allows for functionality such as the automated generationof client-side code in various SOAP frameworks.

Most embodiments utilize at least one network that would be familiar tothose skilled in the art for supporting communications using any of avariety of commercially-available protocols, such as TCP/IP, OSI, FTP,UPnP, NFS, CIFS, and AppleTalk. The network can be, for example, a localarea network, a wide-area network, a virtual private network, theInternet, an intranet, an extranet, a public switched telephone network,an infrared network, a wireless network, and any combination thereof.

In embodiments utilizing a Web server, the Web server can run any of avariety of server or mid-tier applications, including HTTP servers, FTPservers, CGI servers, data servers, Java servers, and businessapplication servers. The server(s) also may be capable of executingprograms or scripts in response requests from user devices, such as byexecuting one or more Web applications that may be implemented as one ormore scripts or programs written in any programming language, such asJava®, C, C# or C++, or any scripting language, such as Perl, Python, orTCL, as well as combinations thereof. The server(s) may also includedatabase servers, including without limitation those commerciallyavailable from Oracle®, Microsoft®, Sybase®, and IBM®.

The environment can include a variety of data stores and other memoryand storage media as discussed above. These can reside in a variety oflocations, such as on a storage medium local to (and/or resident in) oneor more of the computers or remote from any or all of the computersacross the network. In a particular set of embodiments, the informationmay reside in a storage-area network (“SAN”) familiar to those skilledin the art. Similarly, any necessary files for performing the functionsattributed to the computers, servers, or other network devices may bestored locally and/or remotely, as appropriate. Where a system includescomputerized devices, each such device can include hardware elementsthat may be electrically coupled via a bus, the elements including, forexample, at least one central processing unit (CPU), at least one inputdevice (e.g., a mouse, keyboard, controller, touch screen, or keypad),and at least one output device (e.g., a display device, printer, orspeaker). Such a system may also include one or more storage devices,such as disk drives, optical storage devices, and solid-state storagedevices such as random access memory (“RAM”) or read-only memory(“ROM”), as well as removable media devices, memory cards, flash cards,etc.

Such devices also can include a computer-readable storage media reader,a communications device (e.g., a modem, a network card (wireless orwired), an infrared communication device, etc.), and working memory asdescribed above. The computer-readable storage media reader can beconnected with, or configured to receive, a computer-readable storagemedium, representing remote, local, fixed, and/or removable storagedevices as well as storage media for temporarily and/or more permanentlycontaining, storing, transmitting, and retrieving computer-readableinformation. The system and various devices also typically will includea number of software applications, modules, services, or other elementslocated within at least one working memory device, including anoperating system and application programs, such as a client applicationor Web browser. It should be appreciated that alternate embodiments mayhave numerous variations from that described above. For example,customized hardware might also be used and/or particular elements mightbe implemented in hardware, software (including portable software, suchas applets), or both. Further, connection to other computing devicessuch as network input/output devices may be employed.

Storage media and computer readable media for containing code, orportions of code, can include any appropriate media known or used in theart, including storage media and communication media, such as but notlimited to volatile and non-volatile, removable and non-removable mediaimplemented in any method or technology for storage and/or transmissionof information such as computer readable instructions, data structures,program modules, or other data, including RAM, ROM, EEPROM, flash memoryor other memory technology, CD-ROM, digital versatile disk (DVD) orother optical storage, magnetic cassettes, magnetic tape, magnetic diskstorage or other magnetic storage devices, or any other medium which canbe used to store the desired information and which can be accessed bythe a system device. Based on the disclosure and teachings providedherein, a person of ordinary skill in the art will appreciate other waysand/or methods to implement the various embodiments.

The specification and drawings are, accordingly, to be regarded in anillustrative rather than a restrictive sense. It will, however, beevident that various modifications and changes may be made thereuntowithout departing from the broader spirit and scope of the invention asset forth in the claims.

What is claimed is:
 1. A computer-implemented method of managing sharedresources, comprising: under control of one or more computer systemsconfigured with executable instructions, receiving an instance requestassociated with a user, the instance request specifying a type ofresource capacity and a rate of input/output operations to be used inproviding an instance for the user, the instance being operable tohandle I/O operations on behalf of the user; if the type of capacity isa dedicated capacity type and dedicated capacity for the user with atleast the rate of I/O operations is available, generating an instancefor the user using the dedicated capacity; if the type of capacity is areserved capacity type and reserved capacity for the user with at leastthe rate of I/O operations is available, generating an instance for theuser using the reserved capacity; if the type of capacity is an excesscapacity type: determining whether a bid price is a winning bid price,the winning bid price being greater than other pending bids for the sameexcess capacity and being at least equal to a current market price; ifthe bid price is the winning bid price and excess capacity for the userwith at least the rate of I/O operations is available, generating aninstance for the user using the excess capacity for at least a minimumperiod of time; and if the type of capacity is a variable capacity typeand variable capacity is available, generating an instance for the userusing the variable capacity, the variable capacity capable of havingless than the rate of I/O operations specified for the instance request.2. The computer-implemented method of claim 1, further comprising: ifthe dedicated capacity with the rate of I/O operations is not availablefor the instance request specifying the dedicated capacity type,changing the type of capacity specified by the instance request to oneof an excess capacity type or a variable capacity type; if the reservedcapacity with the rate of I/O operations is not available for theinstance request specifying the reserved capacity type, changing thetype of capacity specified by the instance request to one of an excesscapacity type or a variable capacity type; if the excess capacity withthe rate of I/O operations is not available for the instance requestspecifying the excess capacity type, changing the type of capacityspecified by the instance request to a variable capacity type; and ifthe variable capacity is not available for the instance requestspecifying the variable capacity type, denying the instance request. 3.The computer-implemented method of claim 1, further comprising: if theinstance request is being fulfilled using the excess capacity and theexcess capacity becomes no longer available, moving an instancecorresponding to the request to variable capacity if available.
 4. Thecomputer-implemented method of claim 1, further comprising: receiving areservation request from a user to utilize resource capacity to fulfillone or more subsequent instance requests for the user, the reservationrequest specifying a rate of I/O operations to be used in fulfilling theone or more instance requests, each instance request corresponding to aninstance to be created for I/O operations for the user; enabling theuser to purchase dedicated capacity for fulfilling at least a portion ofthe instance requests if dedicated capacity is available with thespecified rate of I/O operations, the dedicated capacity being availableat any time for use by the user; enabling the user to purchase reservedcapacity for fulfilling at least a portion of the instance requests ifreserved capacity is available with the specified rate of I/Ooperations, the user being given priority to use the reserved capacityover other users; and if a user is unable to purchase dedicated orreserved capacity, in response to a subsequent instance request:enabling the user to bid on excess capacity for fulfilling at least aportion of the instance request if excess capacity is available with thespecified rate of I/O operations at substantially a time of submissionof the instance request, the excess capacity being available when a bidprice for the user at least meets a market price for the excesscapacity, the user being able to utilize the excess capacity for atleast a period of time when the bid price meets at least one selectioncriterion; and enabling the user to utilize available variable on-demandcapacity for fulfilling at least a portion of the instance request ifavailable variable on-demand capacity is available.
 5. Thecomputer-implemented method of claim 4, further comprising: enabling theuser to specify another type of capacity to use to fulfill any instancerequests that exceed an amount of capacity specified by the user for atleast a portion of the fulfillment.
 6. The computer-implemented methodof claim 4, further comprising: enabling the user to dynamically adjustthe bid price for the excess capacity in order to continue fulfillmentfor the instance request using the excess capacity.
 7. Thecomputer-implemented method of claim 4, further comprising: enabling theuser to submit a plurality of bids for excess capacity, each bid havinga bid price based on a combination of levels of capacity for multiplecategories of resource capacity.
 8. A computer system for managingshared resources, comprising: one or more processors; and memory,including instructions executable by the one or more processors to causethe computer system to at least: receive an instance request associatedwith a user, the instance request specifying a type of resource capacityand a rate of input/output operations to be used in providing aninstance for the user, the instance being operable to handle I/Ooperations on behalf of the user; if the type of capacity is a dedicatedcapacity type and dedicated capacity for the user with at least the rateof I/O operations is available, generate an instance for the user usingthe dedicated capacity; if the type of capacity is a reserved capacitytype and reserved capacity for the user with at least the rate of I/Ooperations is available, generate an instance for the user using thereserved capacity; if the type of capacity is an excess capacity type:determine whether a bid price is a winning bid price, the winning bidprice being greater than other pending bids for the same excess capacityand being at least equal to a current market price; if the bid price isthe winning bid price and excess capacity for the user with at least therate of I/O operations is available, generate an instance for the userusing the excess capacity for at least a minimum period of time; and ifthe type of capacity is a variable capacity type and variable capacityis available, generate an instance for the user using the variablecapacity, the variable capacity capable of having less than the rate ofI/O operations specified for the instance request.
 9. The computersystem of claim 8, wherein the instructions further cause the computersystem to: if the dedicated capacity with the rate of I/O operations isnot available for the instance request specifying the dedicated capacitytype, change the type of capacity specified by the instance request toone of an excess capacity type or a variable capacity type; if thereserved capacity with the rate of I/O operations is not available forthe instance request specifying the reserved capacity type, change thetype of capacity specified by the instance request to one of an excesscapacity type or a variable capacity type; if the excess capacity withthe rate of I/O operations is not available for the instance requestspecifying the excess capacity type, change the type of capacityspecified by the instance request to a variable capacity type; and ifthe variable capacity is not available for the instance requestspecifying the variable capacity type, deny the instance request. 10.The computer system of claim 8, wherein the instructions further causethe computer system to: if the instance request is being fulfilled usingthe excess capacity and the excess capacity becomes no longer available,move an instance corresponding to the request to variable capacity ifavailable.
 11. The computer system of claim 8, wherein the instructionsfurther cause the computer system to: receive a reservation request froma user to utilize resource capacity to fulfill one or more subsequentinstance requests for the user, the reservation request specifying arate of I/O operations to be used in fulfilling the one or more instancerequests, each instance request corresponding to an instance to becreated for I/O operations for the user; enable the user to purchasededicated capacity for fulfilling at least a portion of the instancerequests if dedicated capacity is available with the specified rate ofI/O operations, the dedicated capacity being available at any time foruse by the user; enable the user to purchase reserved capacity forfulfilling at least a portion of the instance requests if reservedcapacity is available with the specified rate of I/O operations, theuser being given priority to use the reserved capacity over other users;and if a user is unable to purchase dedicated or reserved capacity, inresponse to a subsequent instance request: enable the user to bid onexcess capacity for fulfilling at least a portion of the instancerequest if excess capacity is available with the specified rate of I/Ooperations at substantially a time of submission of the instancerequest, the excess capacity being available when a bid price for theuser at least meets a market price for the excess capacity, the userbeing able to utilize the excess capacity for at least a period of timewhen the bid price meets at least one selection criterion; and enablethe user to utilize available variable on-demand capacity for fulfillingat least a portion of the instance request if available variableon-demand capacity is available.
 12. The computer system of claim 11,wherein the instructions further cause the computer system to: enablethe user to specify another type of capacity to use to fulfill anyinstance requests that exceed an amount of capacity specified by theuser for at least a portion of the fulfillment.
 13. The computer systemof claim 11, wherein the instructions further cause the computer systemto: enable the user to dynamically adjust the bid price for the excesscapacity in order to continue fulfillment for the instance request usingthe excess capacity.
 14. The computer system of claim 11, wherein theinstructions further cause the computer system to: enable the user tosubmit a plurality of bids for excess capacity, each bid having a bidprice based on a combination of levels of capacity for multiplecategories of resource capacity.
 15. A non-transitory computer-readablemedium including instructions stored therein that, when executed by atleast one computing device, cause the at least one computing device to:receive an instance request associated with a user, the instance requestspecifying a type of resource capacity and a rate of input/outputoperations to be used in providing an instance for the user, theinstance being operable to handle I/O operations on behalf of the user;if the type of capacity is a dedicated capacity type and dedicatedcapacity for the user with at least the rate of I/O operations isavailable, generate an instance for the user using the dedicatedcapacity; if the type of capacity is a reserved capacity type andreserved capacity for the user with at least the rate of I/O operationsis available, generate an instance for the user using the reservedcapacity; if the type of capacity is an excess capacity type: determinewhether a bid price is a winning bid price, the winning bid price beinggreater than other pending bids for the same excess capacity and beingat least equal to a current market price; if the bid price is thewinning bid price and excess capacity for the user with at least therate of I/O operations is available, generate an instance for the userusing the excess capacity for at least a minimum period of time; and ifthe type of capacity is a variable capacity type and variable capacityis available, generate an instance for the user using the variablecapacity, the variable capacity capable of having less than the rate ofI/O operations specified for the instance request.
 16. Thenon-transitory computer-readable medium of claim 15, wherein theinstructions further cause the at least one computing device to: if thededicated capacity with the rate of I/O operations is not available forthe instance request specifying the dedicated capacity type, change thetype of capacity specified by the instance request to one of an excesscapacity type or a variable capacity type; if the reserved capacity withthe rate of I/O operations is not available for the instance requestspecifying the reserved capacity type, change the type of capacityspecified by the instance request to one of an excess capacity type or avariable capacity type; if the excess capacity with the rate of I/Ooperations is not available for the instance request specifying theexcess capacity type, change the type of capacity specified by theinstance request to a variable capacity type; and if the variablecapacity is not available for the instance request specifying thevariable capacity type, deny the instance request.
 17. Thenon-transitory computer-readable medium of claim 15, wherein theinstructions further cause the at least one computing device to: if theinstance request is being fulfilled using the excess capacity and theexcess capacity becomes no longer available, move an instancecorresponding to the request to variable capacity if available.
 18. Thenon-transitory computer-readable medium of claim 15, wherein theinstructions further cause the at least one computing device to: receivea reservation request from a user to utilize resource capacity tofulfill one or more subsequent instance requests for the user, thereservation request specifying a rate of I/O operations to be used infulfilling the one or more instance requests, each instance requestcorresponding to an instance to be created for I/O operations for theuser; enable the user to purchase dedicated capacity for fulfilling atleast a portion of the instance requests if dedicated capacity isavailable with the specified rate of I/O operations, the dedicatedcapacity being available at any time for use by the user; enable theuser to purchase reserved capacity for fulfilling at least a portion ofthe instance requests if reserved capacity is available with thespecified rate of I/O operations, the user being given priority to usethe reserved capacity over other users; and if a user is unable topurchase dedicated or reserved capacity, in response to a subsequentinstance request: enable the user to bid on excess capacity forfulfilling at least a portion of the instance request if excess capacityis available with the specified rate of I/O operations at substantiallya time of submission of the instance request, the excess capacity beingavailable when a bid price for the user at least meets a market pricefor the excess capacity, the user being able to utilize the excesscapacity for at least a period of time when the bid price meets at leastone selection criterion; and enable the user to utilize availablevariable on-demand capacity for fulfilling at least a portion of theinstance request if available variable on-demand capacity is available.19. The non-transitory computer-readable medium of claim 18, wherein theinstructions further cause the at least one computing device to: enablethe user to specify another type of capacity to use to fulfill anyinstance requests that exceed an amount of capacity specified by theuser for at least a portion of the fulfillment.
 20. The non-transitorycomputer-readable medium of claim 18, wherein the instructions furthercause the at least one computing device to: enable the user todynamically adjust the bid price for the excess capacity in order tocontinue fulfillment for the instance request using the excess capacity;and enable the user to submit a plurality of bids for excess capacity,each bid having a bid price based on a combination of levels of capacityfor multiple categories of resource capacity.